public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [14/n] PR85694: Rework overwidening detection
@ 2018-06-20 10:37 Richard Sandiford
  2018-06-29 12:56 ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2018-06-20 10:37 UTC (permalink / raw)
  To: gcc-patches

This patch is the main part of PR85694.  The aim is to recognise at least:

  signed char *a, *b, *c;
  ...
  for (int i = 0; i < 2048; i++)
    c[i] = (a[i] + b[i]) >> 1;

as an over-widening pattern, since the addition and shift can be done
on shorts rather than ints.  However, it ended up being a lot more
general than that.

The current over-widening pattern detection is limited to a few simple
cases: logical ops with immediate second operands, and shifts by a
constant.  These cases are enough for common pixel-format conversion
and can be detected in a peephole way.

The loop above requires two generalisations of the current code: support
for addition as well as logical ops, and support for non-constant second
operands.  These are harder to detect in the same peephole way, so the
patch tries to take a more global approach.

The idea is to get information about the minimum operation width
in two ways:

(1) by using the range information attached to the SSA_NAMEs
    (effectively a forward walk, since the range info is
    context-independent).

(2) by back-propagating the number of output bits required by
    users of the result.

As explained in the comments, there's a balance to be struck between
narrowing an individual operation and fitting in with the surrounding
code.  The approach is pretty conservative: if we could narrow an
operation to N bits without changing its semantics, it's OK to do that if:

- no operations later in the chain require more than N bits; or

- all internally-defined inputs are extended from N bits or fewer,
  and at least one of them is single-use.

See the comments for the rationale.

I didn't bother adding STMT_VINFO_* wrappers for the new fields
since the code seemed more readable without.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2018-06-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* poly-int.h (print_hex): New function.
	* dumpfile.h (dump_dec, dump_hex): Declare.
	* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
	* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
	min_input_precision, operation_precision and operation_sign.
	* tree-vect-patterns.c (vect_get_range_info): New function.
	(vect_same_loop_or_bb_p, vect_single_imm_use)
	(vect_operation_fits_smaller_type): Delete.
	(vect_look_through_possible_promotion): Add an optional
	single_use_p parameter.
	(vect_recog_over_widening_pattern): Rewrite to use new
	stmt_vec_info infomration.  Handle one operation at a time.
	(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
	(vect_truncatable_operation_p, vect_set_operation_type)
	(vect_set_min_input_precision): New functions.
	(vect_determine_min_output_precision_1): Likewise.
	(vect_determine_min_output_precision): Likewise.
	(vect_determine_precisions_from_range): Likewise.
	(vect_determine_precisions_from_users): Likewise.
	(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
	(vect_vect_recog_func_ptrs): Put over_widening first.
	Add cast_forwprop.
	(vect_pattern_recog): Call vect_determine_precisions.

gcc/testsuite/
	* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
	over-widening messages.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-5.c: Likewise.
	* gcc.dg/vect/vect-over-widen-6.c: Likewise.
	* gcc.dg/vect/vect-over-widen-7.c: Likewise.
	* gcc.dg/vect/vect-over-widen-8.c: Likewise.
	* gcc.dg/vect/vect-over-widen-9.c: Likewise.
	* gcc.dg/vect/vect-over-widen-10.c: Likewise.
	* gcc.dg/vect/vect-over-widen-11.c: Likewise.
	* gcc.dg/vect/vect-over-widen-12.c: Likewise.
	* gcc.dg/vect/vect-over-widen-13.c: Likewise.
	* gcc.dg/vect/vect-over-widen-14.c: Likewise.
	* gcc.dg/vect/vect-over-widen-15.c: Likewise.
	* gcc.dg/vect/vect-over-widen-16.c: Likewise.
	* gcc.dg/vect/vect-over-widen-17.c: Likewise.
	* gcc.dg/vect/vect-over-widen-18.c: Likewise.
	* gcc.dg/vect/vect-over-widen-19.c: Likewise.
	* gcc.dg/vect/vect-over-widen-20.c: Likewise.
	* gcc.dg/vect/vect-over-widen-21.c: Likewise.

Index: gcc/poly-int.h
===================================================================
*** gcc/poly-int.h	2018-06-20 11:36:19.000000000 +0100
--- gcc/poly-int.h	2018-06-20 11:36:20.135890693 +0100
*************** print_dec (const poly_int_pod<N, C> &val
*** 2420,2425 ****
--- 2420,2444 ----
  	     poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
  }
  
+ /* Use print_hex to print VALUE to FILE.  */
+ 
+ template<unsigned int N, typename C>
+ void
+ print_hex (const poly_int_pod<N, C> &value, FILE *file)
+ {
+   if (value.is_constant ())
+     print_hex (value.coeffs[0], file);
+   else
+     {
+       fprintf (file, "[");
+       for (unsigned int i = 0; i < N; ++i)
+ 	{
+ 	  print_hex (value.coeffs[i], file);
+ 	  fputc (i == N - 1 ? ']' : ',', file);
+ 	}
+     }
+ }
+ 
  /* Helper for calculating the distance between two points P1 and P2,
     in cases where known_le (P1, P2).  T1 and T2 are the types of the
     two positions, in either order.  The coefficients of P2 - P1 have
Index: gcc/dumpfile.h
===================================================================
*** gcc/dumpfile.h	2018-06-20 11:36:19.000000000 +0100
--- gcc/dumpfile.h	2018-06-20 11:36:20.131890728 +0100
*************** extern bool enable_rtl_dump_file (void);
*** 288,293 ****
--- 288,295 ----
  
  template<unsigned int N, typename C>
  void dump_dec (dump_flags_t, const poly_int<N, C> &);
+ extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
+ extern void dump_hex (dump_flags_t, const poly_wide_int &);
  
  /* In tree-dump.c  */
  extern void dump_node (const_tree, dump_flags_t, FILE *);
Index: gcc/dumpfile.c
===================================================================
*** gcc/dumpfile.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/dumpfile.c	2018-06-20 11:36:20.131890728 +0100
*************** template void dump_dec (dump_flags_t, co
*** 512,517 ****
--- 512,539 ----
  template void dump_dec (dump_flags_t, const poly_offset_int &);
  template void dump_dec (dump_flags_t, const poly_widest_int &);
  
+ void
+ dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
+ {
+   if (dump_file && (dump_kind & pflags))
+     print_dec (value, dump_file, sgn);
+ 
+   if (alt_dump_file && (dump_kind & alt_flags))
+     print_dec (value, alt_dump_file, sgn);
+ }
+ 
+ /* Output VALUE in hexadecimal to appropriate dump streams.  */
+ 
+ void
+ dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
+ {
+   if (dump_file && (dump_kind & pflags))
+     print_hex (value, dump_file);
+ 
+   if (alt_dump_file && (dump_kind & alt_flags))
+     print_hex (value, alt_dump_file);
+ }
+ 
  /* Start a dump for PHASE. Store user-supplied dump flags in
     *FLAG_PTR.  Return the number of streams opened.  Set globals
     DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
Index: gcc/tree-vectorizer.h
===================================================================
*** gcc/tree-vectorizer.h	2018-06-20 11:36:19.000000000 +0100
--- gcc/tree-vectorizer.h	2018-06-20 11:36:20.139890658 +0100
*************** typedef struct _stmt_vec_info {
*** 872,877 ****
--- 872,892 ----
  
    /* The number of scalar stmt references from active SLP instances.  */
    unsigned int num_slp_uses;
+ 
+   /* If nonzero, the lhs of the statement could be truncated to this
+      many bits without affecting any users of the result.  */
+   unsigned int min_output_precision;
+ 
+   /* If nonzero, all non-boolean input operands have the same precision,
+      and they could each be truncated to this many bits without changing
+      the result.  */
+   unsigned int min_input_precision;
+ 
+   /* If OPERATION_BITS is nonzero, the statement could be performed on
+      an integer with the sign and number of bits given by OPERATION_SIGN
+      and OPERATION_BITS without changing the result.  */
+   unsigned int operation_precision;
+   signop operation_sign;
  } *stmt_vec_info;
  
  /* Information about a gather/scatter call.  */
Index: gcc/tree-vect-patterns.c
===================================================================
*** gcc/tree-vect-patterns.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/tree-vect-patterns.c	2018-06-20 11:36:20.139890658 +0100
*************** Software Foundation; either version 3, o
*** 47,52 ****
--- 47,86 ----
  #include "omp-simd-clone.h"
  #include "predict.h"
  
+ /* Return true if we have a useful VR_RANGE range for VAR, storing it
+    in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
+ 
+ static bool
+ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
+ {
+   value_range_type vr_type = get_range_info (var, min_value, max_value);
+   wide_int nonzero = get_nonzero_bits (var);
+   signop sgn = TYPE_SIGN (TREE_TYPE (var));
+   if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
+ 					 nonzero, sgn) == VR_RANGE)
+     {
+       if (dump_enabled_p ())
+ 	{
+ 	  dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ 	  dump_printf (MSG_NOTE, " has range [");
+ 	  dump_hex (MSG_NOTE, *min_value);
+ 	  dump_printf (MSG_NOTE, ", ");
+ 	  dump_hex (MSG_NOTE, *max_value);
+ 	  dump_printf (MSG_NOTE, "]\n");
+ 	}
+       return true;
+     }
+   else
+     {
+       if (dump_enabled_p ())
+ 	{
+ 	  dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ 	  dump_printf (MSG_NOTE, " has no range info\n");
+ 	}
+       return false;
+     }
+ }
+ 
  /* Report that we've found an instance of pattern PATTERN in
     statement STMT.  */
  
*************** vect_supportable_direct_optab_p (tree ot
*** 190,229 ****
    return true;
  }
  
- /* Check whether STMT2 is in the same loop or basic block as STMT1.
-    Which of the two applies depends on whether we're currently doing
-    loop-based or basic-block-based vectorization, as determined by
-    the vinfo_for_stmt for STMT1 (which must be defined).
- 
-    If this returns true, vinfo_for_stmt for STMT2 is guaranteed
-    to be defined as well.  */
- 
- static bool
- vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
- {
-   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
-   return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
- }
- 
- /* If the LHS of DEF_STMT has a single use, and that statement is
-    in the same loop or basic block, return it.  */
- 
- static gimple *
- vect_single_imm_use (gimple *def_stmt)
- {
-   tree lhs = gimple_assign_lhs (def_stmt);
-   use_operand_p use_p;
-   gimple *use_stmt;
- 
-   if (!single_imm_use (lhs, &use_p, &use_stmt))
-     return NULL;
- 
-   if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
-     return NULL;
- 
-   return use_stmt;
- }
- 
  /* If OP is defined by a statement that's being considered for vectorization,
     return information about that statement, otherwise return NULL.  */
  
--- 224,229 ----
*************** vect_unpromoted_value::set_op (tree op_i
*** 341,347 ****
     is possible to convert OP' back to OP using a possible sign change
     followed by a possible promotion P.  Return this OP', or null if OP is
     not a vectorizable SSA name.  If there is a promotion P, describe its
!    input in UNPROM, otherwise describe OP' in UNPROM.
  
     A successful return means that it is possible to go from OP' to OP
     via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
--- 341,349 ----
     is possible to convert OP' back to OP using a possible sign change
     followed by a possible promotion P.  Return this OP', or null if OP is
     not a vectorizable SSA name.  If there is a promotion P, describe its
!    input in UNPROM, otherwise describe OP' in UNPROM.  If SINGLE_USE_P
!    is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
!    have more than one user.
  
     A successful return means that it is possible to go from OP' to OP
     via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
*************** vect_unpromoted_value::set_op (tree op_i
*** 368,374 ****
  
  static tree
  vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! 				      vect_unpromoted_value *unprom)
  {
    tree res = NULL_TREE;
    tree op_type = TREE_TYPE (op);
--- 370,377 ----
  
  static tree
  vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! 				      vect_unpromoted_value *unprom,
! 				      bool *single_use_p = NULL)
  {
    tree res = NULL_TREE;
    tree op_type = TREE_TYPE (op);
*************** vect_look_through_possible_promotion (ve
*** 417,422 ****
--- 420,430 ----
  	{
  	  def_stmt = vect_look_through_pattern (def_stmt);
  	  caster = vinfo_for_stmt (def_stmt);
+ 	  /* Ignore pattern statements, since we don't link uses for them.  */
+ 	  if (single_use_p
+ 	      && !STMT_VINFO_RELATED_STMT (caster)
+ 	      && !has_single_use (res))
+ 	    *single_use_p = false;
  	}
        else
  	caster = NULL;
*************** vect_recog_widen_sum_pattern (vec<gimple
*** 1307,1669 ****
    return pattern_stmt;
  }
  
  
! /* Return TRUE if the operation in STMT can be performed on a smaller type.
! 
!    Input:
!    STMT - a statement to check.
!    DEF - we support operations with two operands, one of which is constant.
!          The other operand can be defined by a demotion operation, or by a
!          previous statement in a sequence of over-promoted operations.  In the
!          later case DEF is used to replace that operand.  (It is defined by a
!          pattern statement we created for the previous statement in the
!          sequence).
! 
!    Input/output:
!    NEW_TYPE - Output: a smaller type that we are trying to use.  Input: if not
!          NULL, it's the type of DEF.
!    STMTS - additional pattern statements.  If a pattern statement (type
!          conversion) is created in this function, its original statement is
!          added to STMTS.
  
!    Output:
!    OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
!          operands to use in the new pattern statement for STMT (will be created
!          in vect_recog_over_widening_pattern ()).
!    NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
!          statements for STMT: the first one is a type promotion and the second
!          one is the operation itself.  We return the type promotion statement
! 	 in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
!          the second pattern statement.  */
  
! static bool
! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
! 				  tree *op0, tree *op1, gimple **new_def_stmt,
! 				  vec<gimple *> *stmts)
! {
!   enum tree_code code;
!   tree const_oprnd, oprnd;
!   tree interm_type = NULL_TREE, half_type, new_oprnd, type;
!   gimple *def_stmt, *new_stmt;
!   bool first = false;
!   bool promotion;
  
!   *op0 = NULL_TREE;
!   *op1 = NULL_TREE;
!   *new_def_stmt = NULL;
  
!   if (!is_gimple_assign (stmt))
!     return false;
  
!   code = gimple_assign_rhs_code (stmt);
!   if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
!       && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
!     return false;
  
!   oprnd = gimple_assign_rhs1 (stmt);
!   const_oprnd = gimple_assign_rhs2 (stmt);
!   type = gimple_expr_type (stmt);
  
!   if (TREE_CODE (oprnd) != SSA_NAME
!       || TREE_CODE (const_oprnd) != INTEGER_CST)
!     return false;
  
!   /* If oprnd has other uses besides that in stmt we cannot mark it
!      as being part of a pattern only.  */
!   if (!has_single_use (oprnd))
!     return false;
  
!   /* If we are in the middle of a sequence, we use DEF from a previous
!      statement.  Otherwise, OPRND has to be a result of type promotion.  */
!   if (*new_type)
!     {
!       half_type = *new_type;
!       oprnd = def;
!     }
!   else
      {
!       first = true;
!       if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
! 			      &promotion)
! 	  || !promotion
! 	  || !vect_same_loop_or_bb_p (stmt, def_stmt))
!         return false;
      }
  
!   /* Can we perform the operation on a smaller type?  */
!   switch (code)
!     {
!       case BIT_IOR_EXPR:
!       case BIT_XOR_EXPR:
!       case BIT_AND_EXPR:
!         if (!int_fits_type_p (const_oprnd, half_type))
!           {
!             /* HALF_TYPE is not enough.  Try a bigger type if possible.  */
!             if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
!               return false;
! 
!             interm_type = build_nonstandard_integer_type (
!                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
!             if (!int_fits_type_p (const_oprnd, interm_type))
!               return false;
!           }
! 
!         break;
! 
!       case LSHIFT_EXPR:
!         /* Try intermediate type - HALF_TYPE is not enough for sure.  */
!         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
!           return false;
! 
!         /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
!           (e.g., if the original value was char, the shift amount is at most 8
!            if we want to use short).  */
!         if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
!           return false;
! 
!         interm_type = build_nonstandard_integer_type (
!                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
! 
!         if (!vect_supportable_shift (code, interm_type))
!           return false;
! 
!         break;
! 
!       case RSHIFT_EXPR:
!         if (vect_supportable_shift (code, half_type))
!           break;
! 
!         /* Try intermediate type - HALF_TYPE is not supported.  */
!         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
!           return false;
! 
!         interm_type = build_nonstandard_integer_type (
!                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
! 
!         if (!vect_supportable_shift (code, interm_type))
!           return false;
! 
!         break;
! 
!       default:
!         gcc_unreachable ();
!     }
! 
!   /* There are four possible cases:
!      1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
!         the first statement in the sequence)
!         a. The original, HALF_TYPE, is not enough - we replace the promotion
!            from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
!         b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
!            promotion.
!      2. OPRND is defined by a pattern statement we created.
!         a. Its type is not sufficient for the operation, we create a new stmt:
!            a type conversion for OPRND from HALF_TYPE to INTERM_TYPE.  We store
!            this statement in NEW_DEF_STMT, and it is later put in
! 	   STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
!         b. OPRND is good to use in the new statement.  */
!   if (first)
!     {
!       if (interm_type)
!         {
!           /* Replace the original type conversion HALF_TYPE->TYPE with
!              HALF_TYPE->INTERM_TYPE.  */
!           if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
!             {
!               new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
!               /* Check if the already created pattern stmt is what we need.  */
!               if (!is_gimple_assign (new_stmt)
!                   || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
!                   || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
!                 return false;
! 
! 	      stmts->safe_push (def_stmt);
!               oprnd = gimple_assign_lhs (new_stmt);
!             }
!           else
!             {
!               /* Create NEW_OPRND = (INTERM_TYPE) OPRND.  */
!               oprnd = gimple_assign_rhs1 (def_stmt);
! 	      new_oprnd = make_ssa_name (interm_type);
! 	      new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
!               STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
!               stmts->safe_push (def_stmt);
!               oprnd = new_oprnd;
!             }
!         }
!       else
!         {
!           /* Retrieve the operand before the type promotion.  */
!           oprnd = gimple_assign_rhs1 (def_stmt);
!         }
!     }
!   else
!     {
!       if (interm_type)
!         {
!           /* Create a type conversion HALF_TYPE->INTERM_TYPE.  */
! 	  new_oprnd = make_ssa_name (interm_type);
! 	  new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
!           oprnd = new_oprnd;
!           *new_def_stmt = new_stmt;
!         }
  
!       /* Otherwise, OPRND is already set.  */
      }
  
!   if (interm_type)
!     *new_type = interm_type;
!   else
!     *new_type = half_type;
! 
!   *op0 = oprnd;
!   *op1 = fold_convert (*new_type, const_oprnd);
  
!   return true;
  }
  
  
! /* Try to find a statement or a sequence of statements that can be performed
!    on a smaller type:
  
!      type x_t;
!      TYPE x_T, res0_T, res1_T;
!    loop:
!      S1  x_t = *p;
!      S2  x_T = (TYPE) x_t;
!      S3  res0_T = op (x_T, C0);
!      S4  res1_T = op (res0_T, C1);
!      S5  ... = () res1_T;  - type demotion
! 
!    where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
!    constants.
!    Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
!    be 'type' or some intermediate type.  For now, we expect S5 to be a type
!    demotion operation.  We also check that S3 and S4 have only one use.  */
  
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
!   gimple *stmt = stmts->pop ();
!   gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
! 	 *use_stmt = NULL;
!   tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
!   tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
!   bool first;
!   tree type = NULL;
! 
!   first = true;
!   while (1)
!     {
!       if (!vinfo_for_stmt (stmt)
!           || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
!         return NULL;
! 
!       new_def_stmt = NULL;
!       if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
!                                              &op0, &op1, &new_def_stmt,
!                                              stmts))
!         {
!           if (first)
!             return NULL;
!           else
!             break;
!         }
  
!       /* STMT can be performed on a smaller type.  Check its uses.  */
!       use_stmt = vect_single_imm_use (stmt);
!       if (!use_stmt || !is_gimple_assign (use_stmt))
!         return NULL;
! 
!       /* Create pattern statement for STMT.  */
!       vectype = get_vectype_for_scalar_type (new_type);
!       if (!vectype)
!         return NULL;
! 
!       /* We want to collect all the statements for which we create pattern
!          statetments, except for the case when the last statement in the
!          sequence doesn't have a corresponding pattern statement.  In such
!          case we associate the last pattern statement with the last statement
!          in the sequence.  Therefore, we only add the original statement to
!          the list if we know that it is not the last.  */
!       if (prev_stmt)
!         stmts->safe_push (prev_stmt);
  
!       var = vect_recog_temp_ssa_var (new_type, NULL);
!       pattern_stmt
! 	= gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
!       STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
!       new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
  
!       if (dump_enabled_p ())
!         {
!           dump_printf_loc (MSG_NOTE, vect_location,
!                            "created pattern stmt: ");
!           dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
!         }
  
!       type = gimple_expr_type (stmt);
!       prev_stmt = stmt;
!       stmt = use_stmt;
! 
!       first = false;
!     }
! 
!   /* We got a sequence.  We expect it to end with a type demotion operation.
!      Otherwise, we quit (for now).  There are three possible cases: the
!      conversion is to NEW_TYPE (we don't do anything), the conversion is to
!      a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
!      NEW_TYPE differs (we create a new conversion statement).  */
!   if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
!     {
!       use_lhs = gimple_assign_lhs (use_stmt);
!       use_type = TREE_TYPE (use_lhs);
!       /* Support only type demotion or signedess change.  */
!       if (!INTEGRAL_TYPE_P (use_type)
! 	  || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
!         return NULL;
  
!       /* Check that NEW_TYPE is not bigger than the conversion result.  */
!       if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
! 	return NULL;
  
!       if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
!           || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
!         {
! 	  *type_out = get_vectype_for_scalar_type (use_type);
! 	  if (!*type_out)
! 	    return NULL;
  
!           /* Create NEW_TYPE->USE_TYPE conversion.  */
! 	  new_oprnd = make_ssa_name (use_type);
! 	  pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
!           STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
! 
!           /* We created a pattern statement for the last statement in the
!              sequence, so we don't need to associate it with the pattern
!              statement created for PREV_STMT.  Therefore, we add PREV_STMT
!              to the list in order to mark it later in vect_pattern_recog_1.  */
!           if (prev_stmt)
!             stmts->safe_push (prev_stmt);
!         }
!       else
!         {
!           if (prev_stmt)
! 	    STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
! 	       = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
  
! 	  *type_out = vectype;
!         }
  
!       stmts->safe_push (use_stmt);
!     }
!   else
!     /* TODO: support general case, create a conversion to the correct type.  */
      return NULL;
  
!   /* Pattern detected.  */
!   vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
  
    return pattern_stmt;
  }
  
--- 1315,1632 ----
    return pattern_stmt;
  }
  
+ /* Recognize cases in which an operation is performed in one type WTYPE
+    but could be done more efficiently in a narrower type NTYPE.  For example,
+    if we have:
+ 
+      ATYPE a;  // narrower than NTYPE
+      BTYPE b;  // narrower than NTYPE
+      WTYPE aw = (WTYPE) a;
+      WTYPE bw = (WTYPE) b;
+      WTYPE res = aw + bw;  // only uses of aw and bw
+ 
+    then it would be more efficient to do:
+ 
+      NTYPE an = (NTYPE) a;
+      NTYPE bn = (NTYPE) b;
+      NTYPE resn = an + bn;
+      WTYPE res = (WTYPE) resn;
+ 
+    Other situations include things like:
+ 
+      ATYPE a;  // NTYPE or narrower
+      WTYPE aw = (WTYPE) a;
+      WTYPE res = aw + b;
+ 
+    when only "(NTYPE) res" is significant.  In that case it's more efficient
+    to truncate "b" and do the operation on NTYPE instead:
+ 
+      NTYPE an = (NTYPE) a;
+      NTYPE bn = (NTYPE) b;  // truncation
+      NTYPE resn = an + bn;
+      WTYPE res = (WTYPE) resn;
+ 
+    All users of "res" should then use "resn" instead, making the final
+    statement dead (not marked as relevant).  The final statement is still
+    needed to maintain the type correctness of the IR.
+ 
+    vect_determine_precisions has already determined the minimum
+    precison of the operation and the minimum precision required
+    by users of the result.  */
  
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
!   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
!   if (!last_stmt)
!     return NULL;
  
!   /* See whether we have found that this operation can be done on a
!      narrower type without changing its semantics.  */
!   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
!   unsigned int new_precision = last_stmt_info->operation_precision;
!   if (!new_precision)
!     return NULL;
  
!   vec_info *vinfo = last_stmt_info->vinfo;
!   tree lhs = gimple_assign_lhs (last_stmt);
!   tree type = TREE_TYPE (lhs);
!   tree_code code = gimple_assign_rhs_code (last_stmt);
! 
!   /* Keep the first operand of a COND_EXPR as-is: only the other two
!      operands are interesting.  */
!   unsigned int first_op = (code == COND_EXPR ? 2 : 1);
! 
!   /* Check the operands.  */
!   unsigned int nops = gimple_num_ops (last_stmt) - first_op;
!   auto_vec <vect_unpromoted_value, 3> unprom (nops);
!   unprom.quick_grow (nops);
!   unsigned int min_precision = 0;
!   bool single_use_p = false;
!   for (unsigned int i = 0; i < nops; ++i)
!     {
!       tree op = gimple_op (last_stmt, first_op + i);
!       if (TREE_CODE (op) == INTEGER_CST)
! 	unprom[i].set_op (op, vect_constant_def);
!       else if (TREE_CODE (op) == SSA_NAME)
! 	{
! 	  bool op_single_use_p = true;
! 	  if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
! 						     &op_single_use_p))
! 	    return NULL;
! 	  /* If:
  
! 	     (1) N bits of the result are needed;
! 	     (2) all inputs are widened from M<N bits; and
! 	     (3) one operand OP is a single-use SSA name
! 
! 	     we can shift the M->N widening from OP to the output
! 	     without changing the number or type of extensions involved.
! 	     This then reduces the number of copies of STMT_INFO.
! 
! 	     If instead of (3) more than one operand is a single-use SSA name,
! 	     shifting the extension to the output is even more of a win.
! 
! 	     If instead:
! 
! 	     (1) N bits of the result are needed;
! 	     (2) one operand OP2 is widened from M2<N bits;
! 	     (3) another operand OP1 is widened from M1<M2 bits; and
! 	     (4) both OP1 and OP2 are single-use
! 
! 	     the choice is between:
! 
! 	     (a) truncating OP2 to M1, doing the operation on M1,
! 		 and then widening the result to N
! 
! 	     (b) widening OP1 to M2, doing the operation on M2, and then
! 		 widening the result to N
! 
! 	     Both shift the M2->N widening of the inputs to the output.
! 	     (a) additionally shifts the M1->M2 widening to the output;
! 	     it requires fewer copies of STMT_INFO but requires an extra
! 	     M2->M1 truncation.
! 
! 	     Which is better will depend on the complexity and cost of
! 	     STMT_INFO, which is hard to predict at this stage.  However,
! 	     a clear tie-breaker in favor of (b) is the fact that the
! 	     truncation in (a) increases the length of the operation chain.
! 
! 	     If instead of (4) only one of OP1 or OP2 is single-use,
! 	     (b) is still a win over doing the operation in N bits:
! 	     it still shifts the M2->N widening on the single-use operand
! 	     to the output and reduces the number of STMT_INFO copies.
! 
! 	     If neither operand is single-use then operating on fewer than
! 	     N bits might lead to more extensions overall.  Whether it does
! 	     or not depends on global information about the vectorization
! 	     region, and whether that's a good trade-off would again
! 	     depend on the complexity and cost of the statements involved,
! 	     as well as things like register pressure that are not normally
! 	     modelled at this stage.  We therefore ignore these cases
! 	     and just optimize the clear single-use wins above.
! 
! 	     Thus we take the maximum precision of the unpromoted operands
! 	     and record whether any operand is single-use.  */
! 	  if (unprom[i].dt == vect_internal_def)
! 	    {
! 	      min_precision = MAX (min_precision,
! 				   TYPE_PRECISION (unprom[i].type));
! 	      single_use_p |= op_single_use_p;
! 	    }
! 	}
!     }
  
!   /* Although the operation could be done in operation_precision, we have
!      to balance that against introducing extra truncations or extensions.
!      Calculate the minimum precision that can be handled efficiently.
! 
!      The loop above determined that the operation could be handled
!      efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
!      extension from the inputs to the output without introducing more
!      instructions, and would reduce the number of instructions required
!      for STMT_INFO itself.
! 
!      vect_determine_precisions has also determined that the result only
!      needs min_output_precision bits.  Truncating by a factor of N times
!      requires a tree of N - 1 instructions, so if TYPE is N times wider
!      than min_output_precision, doing the operation in TYPE and truncating
!      the result requires N + (N - 1) = 2N - 1 instructions per output vector.
!      In contrast:
! 
!      - truncating the input to a unary operation and doing the operation
!        in the new type requires at most N - 1 + 1 = N instructions per
!        output vector
! 
!      - doing the same for a binary operation requires at most
!        (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
! 
!      Both unary and binary operations require fewer instructions than
!      this if the operands were extended from a suitable truncated form.
!      Thus there is usually nothing to lose by doing operations in
!      min_output_precision bits, but there can be something to gain.  */
!   if (!single_use_p)
!     min_precision = last_stmt_info->min_output_precision;
!   else
!     min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
  
!   /* Apply the minimum efficient precision we just calculated.  */
!   if (new_precision < min_precision)
!     new_precision = min_precision;
!   if (new_precision >= TYPE_PRECISION (type))
!     return NULL;
  
!   vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
  
!   *type_out = get_vectype_for_scalar_type (type);
!   if (!*type_out)
!     return NULL;
  
!   /* We've found a viable pattern.  Get the new type of the operation.  */
!   bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
!   tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
! 
!   /* We specifically don't check here whether the target supports the
!      new operation, since it might be something that a later pattern
!      wants to rewrite anyway.  If targets have a minimum element size
!      for some optabs, we should pattern-match smaller ops to larger ops
!      where beneficial.  */
!   tree new_vectype = get_vectype_for_scalar_type (new_type);
!   if (!new_vectype)
!     return NULL;
  
!   if (dump_enabled_p ())
      {
!       dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
!       dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
!       dump_printf (MSG_NOTE, " to ");
!       dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
!       dump_printf (MSG_NOTE, "\n");
      }
  
!   /* Calculate the rhs operands for an operation on NEW_TYPE.  */
!   STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
!   tree ops[3] = {};
!   for (unsigned int i = 1; i < first_op; ++i)
!     ops[i - 1] = gimple_op (last_stmt, i);
!   vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
! 		       new_type, &unprom[0], new_vectype);
! 
!   /* Use the operation to produce a result of type NEW_TYPE.  */
!   tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
!   gimple *pattern_stmt = gimple_build_assign (new_var, code,
! 					      ops[0], ops[1], ops[2]);
!   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
  
!   if (dump_enabled_p ())
!     {
!       dump_printf_loc (MSG_NOTE, vect_location,
! 		       "created pattern stmt: ");
!       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
      }
  
!   pattern_stmt = vect_convert_output (last_stmt_info, type,
! 				      pattern_stmt, new_vectype);
  
!   stmts->safe_push (last_stmt);
!   return pattern_stmt;
  }
  
+ /* Recognize cases in which the input to a cast is wider than its
+    output, and the input is fed by a widening operation.  Fold this
+    by removing the unnecessary intermediate widening.  E.g.:
  
!      unsigned char a;
!      unsigned int b = (unsigned int) a;
!      unsigned short c = (unsigned short) b;
  
!    -->
  
!      unsigned short c = (unsigned short) a;
  
!    Although this is rare in input IR, it is an expected side-effect
!    of the over-widening pattern above.
  
!    This is beneficial also for integer-to-float conversions, if the
!    widened integer has more bits than the float, and if the unwidened
!    input doesn't.  */
  
! static gimple *
! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
! {
!   /* Check for a cast, including an integer-to-float conversion.  */
!   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
!   if (!last_stmt)
!     return NULL;
!   tree_code code = gimple_assign_rhs_code (last_stmt);
!   if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
!     return NULL;
  
!   /* Make sure that the rhs is a scalar with a natural bitsize.  */
!   tree lhs = gimple_assign_lhs (last_stmt);
!   if (!lhs)
!     return NULL;
!   tree lhs_type = TREE_TYPE (lhs);
!   scalar_mode lhs_mode;
!   if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
!       || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
!     return NULL;
  
!   /* Check for a narrowing operation (from a vector point of view).  */
!   tree rhs = gimple_assign_rhs1 (last_stmt);
!   tree rhs_type = TREE_TYPE (rhs);
!   if (!INTEGRAL_TYPE_P (rhs_type)
!       || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
!       || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
!     return NULL;
  
!   /* Try to find an unpromoted input.  */
!   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
!   vec_info *vinfo = last_stmt_info->vinfo;
!   vect_unpromoted_value unprom;
!   if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
!       || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
!     return NULL;
  
!   /* If the bits above RHS_TYPE matter, make sure that they're the
!      same when extending from UNPROM as they are when extending from RHS.  */
!   if (!INTEGRAL_TYPE_P (lhs_type)
!       && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
!     return NULL;
  
!   /* We can get the same result by casting UNPROM directly, to avoid
!      the unnecessary widening and narrowing.  */
!   vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
  
!   *type_out = get_vectype_for_scalar_type (lhs_type);
!   if (!*type_out)
      return NULL;
  
!   tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
!   gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
!   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
  
+   stmts->safe_push (last_stmt);
    return pattern_stmt;
  }
  
*************** vect_recog_gather_scatter_pattern (vec<g
*** 4145,4150 ****
--- 4108,4498 ----
    return pattern_stmt;
  }
  
+ /* Return true if TYPE is a non-boolean integer type.  These are the types
+    that we want to consider for narrowing.  */
+ 
+ static bool
+ vect_narrowable_type_p (tree type)
+ {
+   return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
+ }
+ 
+ /* Return true if the operation given by CODE can be truncated to N bits
+    when only N bits of the output are needed.  This is only true if bit N+1
+    of the inputs has no effect on the low N bits of the result.  */
+ 
+ static bool
+ vect_truncatable_operation_p (tree_code code)
+ {
+   switch (code)
+     {
+     case PLUS_EXPR:
+     case MINUS_EXPR:
+     case MULT_EXPR:
+     case BIT_AND_EXPR:
+     case BIT_IOR_EXPR:
+     case BIT_XOR_EXPR:
+     case COND_EXPR:
+       return true;
+ 
+     default:
+       return false;
+     }
+ }
+ 
+ /* Record that STMT_INFO could be changed from operating on TYPE to
+    operating on a type with the precision and sign given by PRECISION
+    and SIGN respectively.  PRECISION is an arbitrary bit precision;
+    it might not be a whole number of bytes.  */
+ 
+ static void
+ vect_set_operation_type (stmt_vec_info stmt_info, tree type,
+ 			 unsigned int precision, signop sign)
+ {
+   /* Round the precision up to a whole number of bytes.  */
+   precision = 1 << ceil_log2 (precision);
+   precision = MAX (precision, BITS_PER_UNIT);
+   if (precision < TYPE_PRECISION (type)
+       && (!stmt_info->operation_precision
+ 	  || stmt_info->operation_precision > precision))
+     {
+       stmt_info->operation_precision = precision;
+       stmt_info->operation_sign = sign;
+     }
+ }
+ 
+ /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
+    non-boolean inputs, all of which have type TYPE.  MIN_INPUT_PRECISION
+    is an arbitrary bit precision; it might not be a whole number of bytes.  */
+ 
+ static void
+ vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
+ 			      unsigned int min_input_precision)
+ {
+   /* This operation in isolation only requires the inputs to have
+      MIN_INPUT_PRECISION of precision,  However, that doesn't mean
+      that MIN_INPUT_PRECISION is a natural precision for the chain
+      as a whole.  E.g. consider something like:
+ 
+ 	 unsigned short *x, *y;
+ 	 *y = ((*x & 0xf0) >> 4) | (*y << 4);
+ 
+      The right shift can be done on unsigned chars, and only requires the
+      result of "*x & 0xf0" to be done on unsigned chars.  But taking that
+      approach would mean turning a natural chain of single-vector unsigned
+      short operations into one that truncates "*x" and then extends
+      "(*x & 0xf0) >> 4", with two vectors for each unsigned short
+      operation and one vector for each unsigned char operation.
+      This would be a significant pessimization.
+ 
+      Instead only propagate the maximum of this precision and the precision
+      required by the users of the result.  This means that we don't pessimize
+      the case above but continue to optimize things like:
+ 
+ 	 unsigned char *y;
+ 	 unsigned short *x;
+ 	 *y = ((*x & 0xf0) >> 4) | (*y << 4);
+ 
+      Here we would truncate two vectors of *x to a single vector of
+      unsigned chars and use single-vector unsigned char operations for
+      everything else, rather than doing two unsigned short copies of
+      "(*x & 0xf0) >> 4" and then truncating the result.  */
+   min_input_precision = MAX (min_input_precision,
+ 			     stmt_info->min_output_precision);
+ 
+   if (min_input_precision < TYPE_PRECISION (type)
+       && (!stmt_info->min_input_precision
+ 	  || stmt_info->min_input_precision > min_input_precision))
+     stmt_info->min_input_precision = min_input_precision;
+ }
+ 
+ /* Subroutine of vect_determine_min_output_precision.  Return true if
+    we can calculate a reduced number of output bits for STMT_INFO,
+    whose result is LHS.  */
+ 
+ static bool
+ vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
+ {
+   /* Take the maximum precision required by users of the result.  */
+   unsigned int precision = 0;
+   imm_use_iterator iter;
+   use_operand_p use;
+   FOR_EACH_IMM_USE_FAST (use, iter, lhs)
+     {
+       gimple *use_stmt = USE_STMT (use);
+       if (is_gimple_debug (use_stmt))
+ 	continue;
+       if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
+ 	return false;
+       stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
+       if (!use_stmt_info->min_input_precision)
+ 	return false;
+       precision = MAX (precision, use_stmt_info->min_input_precision);
+     }
+ 
+   if (dump_enabled_p ())
+     {
+       dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
+ 		       precision);
+       dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
+       dump_printf (MSG_NOTE, " are significant\n");
+     }
+   stmt_info->min_output_precision = precision;
+   return true;
+ }
+ 
+ /* Calculate min_output_precision for STMT_INFO.  */
+ 
+ static void
+ vect_determine_min_output_precision (stmt_vec_info stmt_info)
+ {
+   /* We're only interested in statements with a narrowable result.  */
+   tree lhs = gimple_get_lhs (stmt_info->stmt);
+   if (!lhs
+       || TREE_CODE (lhs) != SSA_NAME
+       || !vect_narrowable_type_p (TREE_TYPE (lhs)))
+     return;
+ 
+   if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
+     stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
+ }
+ 
+ /* Use range information to decide whether STMT (described by STMT_INFO)
+    could be done in a narrower type.  This is effectively a forward
+    propagation, since it uses context-independent information that applies
+    to all users of an SSA name.  */
+ 
+ static void
+ vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
+ {
+   tree lhs = gimple_assign_lhs (stmt);
+   if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+     return;
+ 
+   tree type = TREE_TYPE (lhs);
+   if (!vect_narrowable_type_p (type))
+     return;
+ 
+   /* First see whether we have any useful range information for the result.  */
+   unsigned int precision = TYPE_PRECISION (type);
+   signop sign = TYPE_SIGN (type);
+   wide_int min_value, max_value;
+   if (!vect_get_range_info (lhs, &min_value, &max_value))
+     return;
+ 
+   tree_code code = gimple_assign_rhs_code (stmt);
+   unsigned int nops = gimple_num_ops (stmt);
+ 
+   if (!vect_truncatable_operation_p (code))
+     /* Check that all relevant input operands are compatible, and update
+        [MIN_VALUE, MAX_VALUE] to include their ranges.  */
+     for (unsigned int i = 1; i < nops; ++i)
+       {
+ 	tree op = gimple_op (stmt, i);
+ 	if (TREE_CODE (op) == INTEGER_CST)
+ 	  {
+ 	    /* Don't require the integer to have RHS_TYPE (which it might
+ 	       not for things like shift amounts, etc.), but do require it
+ 	       to fit the type.  */
+ 	    if (!int_fits_type_p (op, type))
+ 	      return;
+ 
+ 	    min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
+ 	    max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
+ 	  }
+ 	else if (TREE_CODE (op) == SSA_NAME)
+ 	  {
+ 	    /* Ignore codes that don't take uniform arguments.  */
+ 	    if (!types_compatible_p (TREE_TYPE (op), type))
+ 	      return;
+ 
+ 	    wide_int op_min_value, op_max_value;
+ 	    if (!vect_get_range_info (op, &op_min_value, &op_max_value))
+ 	      return;
+ 
+ 	    min_value = wi::min (min_value, op_min_value, sign);
+ 	    max_value = wi::max (max_value, op_max_value, sign);
+ 	  }
+ 	else
+ 	  return;
+       }
+ 
+   /* Try to switch signed types for unsigned types if we can.
+      This is better for two reasons.  First, unsigned ops tend
+      to be cheaper than signed ops.  Second, it means that we can
+      handle things like:
+ 
+ 	signed char c;
+ 	int res = (int) c & 0xff00; // range [0x0000, 0xff00]
+ 
+      as:
+ 
+ 	signed char c;
+ 	unsigned short res_1 = (unsigned short) c & 0xff00;
+ 	int res = (int) res_1;
+ 
+      where the intermediate result res_1 has unsigned rather than
+      signed type.  */
+   if (sign == SIGNED && !wi::neg_p (min_value))
+     sign = UNSIGNED;
+ 
+   /* See what precision is required for MIN_VALUE and MAX_VALUE.  */
+   unsigned int precision1 = wi::min_precision (min_value, sign);
+   unsigned int precision2 = wi::min_precision (max_value, sign);
+   unsigned int value_precision = MAX (precision1, precision2);
+   if (value_precision >= precision)
+     return;
+ 
+   if (dump_enabled_p ())
+     {
+       dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ 		       " without loss of precision: ",
+ 		       sign == SIGNED ? "signed" : "unsigned",
+ 		       value_precision);
+       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+     }
+ 
+   vect_set_operation_type (stmt_info, type, value_precision, sign);
+   vect_set_min_input_precision (stmt_info, type, value_precision);
+ }
+ 
+ /* Use information about the users of STMT's result to decide whether
+    STMT (described by STMT_INFO) could be done in a narrower type.
+    This is effectively a backward propagation.  */
+ 
+ static void
+ vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
+ {
+   tree_code code = gimple_assign_rhs_code (stmt);
+   unsigned int opno = (code == COND_EXPR ? 2 : 1);
+   tree type = TREE_TYPE (gimple_op (stmt, opno));
+   if (!vect_narrowable_type_p (type))
+     return;
+ 
+   unsigned int precision = TYPE_PRECISION (type);
+   unsigned int operation_precision, min_input_precision;
+   switch (code)
+     {
+     CASE_CONVERT:
+       /* Only the bits that contribute to the output matter.  Don't change
+ 	 the precision of the operation itself.  */
+       operation_precision = precision;
+       min_input_precision = stmt_info->min_output_precision;
+       break;
+ 
+     case LSHIFT_EXPR:
+     case RSHIFT_EXPR:
+       {
+ 	tree shift = gimple_assign_rhs2 (stmt);
+ 	if (TREE_CODE (shift) != INTEGER_CST
+ 	    || !wi::ltu_p (wi::to_widest (shift), precision))
+ 	  return;
+ 	unsigned int const_shift = TREE_INT_CST_LOW (shift);
+ 	if (code == LSHIFT_EXPR)
+ 	  {
+ 	    /* We need CONST_SHIFT fewer bits of the input.  */
+ 	    operation_precision = stmt_info->min_output_precision;
+ 	    min_input_precision = (MAX (operation_precision, const_shift)
+ 				    - const_shift);
+ 	  }
+ 	else
+ 	  {
+ 	    /* We need CONST_SHIFT extra bits to do the operation.  */
+ 	    operation_precision = (stmt_info->min_output_precision
+ 				   + const_shift);
+ 	    min_input_precision = operation_precision;
+ 	  }
+ 	break;
+       }
+ 
+     default:
+       if (vect_truncatable_operation_p (code))
+ 	{
+ 	  /* Input bit N has no effect on output bits N-1 and lower.  */
+ 	  operation_precision = stmt_info->min_output_precision;
+ 	  min_input_precision = operation_precision;
+ 	  break;
+ 	}
+       return;
+     }
+ 
+   if (operation_precision < precision)
+     {
+       if (dump_enabled_p ())
+ 	{
+ 	  dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ 			   " without affecting users: ",
+ 			   TYPE_UNSIGNED (type) ? "unsigned" : "signed",
+ 			   operation_precision);
+ 	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ 	}
+       vect_set_operation_type (stmt_info, type, operation_precision,
+ 			       TYPE_SIGN (type));
+     }
+   vect_set_min_input_precision (stmt_info, type, min_input_precision);
+ }
+ 
+ /* Handle vect_determine_precisions for STMT_INFO, given that we
+    have already done so for the users of its result.  */
+ 
+ void
+ vect_determine_stmt_precisions (stmt_vec_info stmt_info)
+ {
+   vect_determine_min_output_precision (stmt_info);
+   if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
+     {
+       vect_determine_precisions_from_range (stmt_info, stmt);
+       vect_determine_precisions_from_users (stmt_info, stmt);
+     }
+ }
+ 
+ /* Walk backwards through the vectorizable region to determine the
+    values of these fields:
+ 
+    - min_output_precision
+    - min_input_precision
+    - operation_precision
+    - operation_sign.  */
+ 
+ void
+ vect_determine_precisions (vec_info *vinfo)
+ {
+   DUMP_VECT_SCOPE ("vect_determine_precisions");
+ 
+   if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+     {
+       struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+       basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+       unsigned int nbbs = loop->num_nodes;
+ 
+       for (unsigned int i = 0; i < nbbs; i++)
+ 	{
+ 	  basic_block bb = bbs[nbbs - i - 1];
+ 	  for (gimple_stmt_iterator si = gsi_last_bb (bb);
+ 	       !gsi_end_p (si); gsi_prev (&si))
+ 	    vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
+ 	}
+     }
+   else
+     {
+       bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+       gimple_stmt_iterator si = bb_vinfo->region_end;
+       gimple *stmt;
+       do
+ 	{
+ 	  if (!gsi_stmt (si))
+ 	    si = gsi_last_bb (bb_vinfo->bb);
+ 	  else
+ 	    gsi_prev (&si);
+ 	  stmt = gsi_stmt (si);
+ 	  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ 	  if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+ 	    vect_determine_stmt_precisions (stmt_info);
+ 	}
+       while (stmt != gsi_stmt (bb_vinfo->region_begin));
+     }
+ }
+ 
  typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
  
  struct vect_recog_func
*************** struct vect_recog_func
*** 4157,4169 ****
     taken which means usually the more complex one needs to preceed the
     less comples onex (widen_sum only after dot_prod or sad for example).  */
  static vect_recog_func vect_vect_recog_func_ptrs[] = {
    { vect_recog_widen_mult_pattern, "widen_mult" },
    { vect_recog_dot_prod_pattern, "dot_prod" },
    { vect_recog_sad_pattern, "sad" },
    { vect_recog_widen_sum_pattern, "widen_sum" },
    { vect_recog_pow_pattern, "pow" },
    { vect_recog_widen_shift_pattern, "widen_shift" },
-   { vect_recog_over_widening_pattern, "over_widening" },
    { vect_recog_rotate_pattern, "rotate" },
    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
    { vect_recog_divmod_pattern, "divmod" },
--- 4505,4518 ----
     taken which means usually the more complex one needs to preceed the
     less comples onex (widen_sum only after dot_prod or sad for example).  */
  static vect_recog_func vect_vect_recog_func_ptrs[] = {
+   { vect_recog_over_widening_pattern, "over_widening" },
+   { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
    { vect_recog_widen_mult_pattern, "widen_mult" },
    { vect_recog_dot_prod_pattern, "dot_prod" },
    { vect_recog_sad_pattern, "sad" },
    { vect_recog_widen_sum_pattern, "widen_sum" },
    { vect_recog_pow_pattern, "pow" },
    { vect_recog_widen_shift_pattern, "widen_shift" },
    { vect_recog_rotate_pattern, "rotate" },
    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
    { vect_recog_divmod_pattern, "divmod" },
*************** vect_pattern_recog (vec_info *vinfo)
*** 4437,4442 ****
--- 4786,4793 ----
    unsigned int i, j;
    auto_vec<gimple *, 1> stmts_to_replace;
  
+   vect_determine_precisions (vinfo);
+ 
    DUMP_VECT_SCOPE ("vect_pattern_recog");
  
    if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 62,69 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 62,70 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 58,64 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 58,66 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 57,63 ****
    return 0;
  }
  
! /* Final value stays in int, so no over-widening is detected at the moment.  */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 57,68 ----
    return 0;
  }
  
! /* This is an over-widening even though the final result is still an int.
!    It's better to do one vector of ops on chars and then widen than to
!    widen and then do 4 vectors of ops on ints.  */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 57,63 ****
    return 0;
  }
  
! /* Final value stays in int, so no over-widening is detected at the moment.  */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 57,68 ----
    return 0;
  }
  
! /* This is an over-widening even though the final result is still an int.
!    It's better to do one vector of ops on chars and then widen than to
!    widen and then do 4 vectors of ops on ints.  */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 57,62 ****
    return 0;
  }
  
! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 57,65 ----
    return 0;
  }
  
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 59,65 ****
    return 0;
  }
  
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 59,67 ----
    return 0;
  }
  
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 66,73 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 66,74 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 62,68 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 62,70 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,66 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ /* Deliberate use of signed >>.  */
+ #define DEF_LOOP(SIGNEDNESS)			\
+   void __attribute__ ((noipa))			\
+   f_##SIGNEDNESS (SIGNEDNESS char *restrict a,	\
+ 		  SIGNEDNESS char *restrict b,	\
+ 		  SIGNEDNESS char *restrict c)	\
+   {						\
+     a[0] = (b[0] + c[0]) >> 1;			\
+     a[1] = (b[1] + c[1]) >> 1;			\
+     a[2] = (b[2] + c[2]) >> 1;			\
+     a[3] = (b[3] + c[3]) >> 1;			\
+     a[4] = (b[4] + c[4]) >> 1;			\
+     a[5] = (b[5] + c[5]) >> 1;			\
+     a[6] = (b[6] + c[6]) >> 1;			\
+     a[7] = (b[7] + c[7]) >> 1;			\
+     a[8] = (b[8] + c[8]) >> 1;			\
+     a[9] = (b[9] + c[9]) >> 1;			\
+     a[10] = (b[10] + c[10]) >> 1;		\
+     a[11] = (b[11] + c[11]) >> 1;		\
+     a[12] = (b[12] + c[12]) >> 1;		\
+     a[13] = (b[13] + c[13]) >> 1;		\
+     a[14] = (b[14] + c[14]) >> 1;		\
+     a[15] = (b[15] + c[15]) >> 1;		\
+   }
+ 
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+ 
+ #define N 16
+ 
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C)		\
+   {							\
+     SIGNEDNESS char a[N], b[N], c[N];			\
+     for (int i = 0; i < N; ++i)				\
+       {							\
+ 	b[i] = BASE_B + i * 15;				\
+ 	c[i] = BASE_C + i * 14;				\
+ 	asm volatile ("" ::: "memory");			\
+       }							\
+     f_##SIGNEDNESS (a, b, c);				\
+     for (int i = 0; i < N; ++i)				\
+       if (a[i] != (BASE_B + BASE_C + i * 29) >> 1)	\
+ 	__builtin_abort ();				\
+   }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   TEST_LOOP (signed, -128, -120);
+   TEST_LOOP (unsigned, 4, 10);
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,65 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ /* Deliberate use of signed >>.  */
+ #define DEF_LOOP(SIGNEDNESS)			\
+   void __attribute__ ((noipa))			\
+   f_##SIGNEDNESS (SIGNEDNESS char *restrict a,	\
+ 		  SIGNEDNESS char *restrict b,	\
+ 		  SIGNEDNESS char c)		\
+   {						\
+     a[0] = (b[0] + c) >> 1;			\
+     a[1] = (b[1] + c) >> 1;			\
+     a[2] = (b[2] + c) >> 1;			\
+     a[3] = (b[3] + c) >> 1;			\
+     a[4] = (b[4] + c) >> 1;			\
+     a[5] = (b[5] + c) >> 1;			\
+     a[6] = (b[6] + c) >> 1;			\
+     a[7] = (b[7] + c) >> 1;			\
+     a[8] = (b[8] + c) >> 1;			\
+     a[9] = (b[9] + c) >> 1;			\
+     a[10] = (b[10] + c) >> 1;			\
+     a[11] = (b[11] + c) >> 1;			\
+     a[12] = (b[12] + c) >> 1;			\
+     a[13] = (b[13] + c) >> 1;			\
+     a[14] = (b[14] + c) >> 1;			\
+     a[15] = (b[15] + c) >> 1;			\
+   }
+ 
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+ 
+ #define N 16
+ 
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, C)		\
+   {							\
+     SIGNEDNESS char a[N], b[N], c[N];			\
+     for (int i = 0; i < N; ++i)				\
+       {							\
+ 	b[i] = BASE_B + i * 15;				\
+ 	asm volatile ("" ::: "memory");			\
+       }							\
+     f_##SIGNEDNESS (a, b, C);				\
+     for (int i = 0; i < N; ++i)				\
+       if (a[i] != (BASE_B + C + i * 15) >> 1)		\
+ 	__builtin_abort ();				\
+   }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   TEST_LOOP (signed, -128, -120);
+   TEST_LOOP (unsigned, 4, 250);
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   /* Deliberate use of signed >>.  */
+   for (int i = 0; i < N; ++i)
+     a[i] = (b[i] + c[i]) >> 1;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c	2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ 
+ #include "vect-over-widen-5.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c	2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #define D -120
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c, SIGNEDNESS char d)
+ {
+   int promoted_d = d;
+   for (int i = 0; i < N; ++i)
+     /* Deliberate use of signed >>.  */
+     a[i] = (b[i] + c[i] + promoted_d) >> 2;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, D);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c	2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #define D 251
+ #endif
+ 
+ #include "vect-over-widen-7.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c	2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   for (int i = 0; i < N; ++i)
+     {
+       /* Deliberate use of signed >>.  */
+       int res = b[i] + c[i];
+       a[i] = (res + (res >> 1)) >> 2;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     {
+       int res = BASE_B + BASE_C + i * 9;
+       if (a[i] != ((res + (res >> 1)) >> 2))
+ 	__builtin_abort ();
+     }
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-9.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,63 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short, with "res"
+    being extended for the store to d[i].  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c, int *restrict d)
+ {
+   for (int i = 0; i < N; ++i)
+     {
+       /* Deliberate use of signed >>.  */
+       int res = b[i] + c[i];
+       a[i] = (res + (res >> 1)) >> 2;
+       d[i] = res;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   int d[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d);
+   for (int i = 0; i < N; ++i)
+     {
+       int res = BASE_B + BASE_C + i * 9;
+       if (a[i] != ((res + (res >> 1)) >> 2))
+ 	__builtin_abort ();
+       if (d[i] != res)
+ 	__builtin_abort ();
+     }
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-11.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+ 
+ #define N 50
+ 
+ /* We rely on range analysis to show that these calculations can be done
+    in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   for (int i = 0; i < N; ++i)
+     a[i] = (b[i] + c[i]) / 2;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-13.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+ 
+ #define N 50
+ 
+ /* We rely on range analysis to show that these calculations can be done
+    in SIGNEDNESS short, with the result being extended to int for the
+    store.  */
+ void __attribute__ ((noipa))
+ f (int *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   for (int i = 0; i < N; ++i)
+     a[i] = (b[i] + c[i]) / 2;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   int a[N];
+   SIGNEDNESS char b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-15.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,46 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 1024
+ 
+ /* This should not be treated as an over-widening pattern, even though
+    "(b[i] & 0xef) | 0x80)" could be done in unsigned chars.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned short *restrict a, unsigned short *restrict b)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+       a[i] = foo;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned short a[N], b[N];
+   for (int i = 0; i < N; ++i)
+     {
+       a[i] = i;
+       b[i] = i * 3;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 1024
+ 
+ /* This should be treated as an over-widening pattern: we can truncate
+    b to unsigned char after loading it and do all the computation in
+    unsigned char.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned char *restrict a, unsigned short *restrict b)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+       a[i] = foo;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned char a[N];
+   unsigned short b[N];
+   for (int i = 0; i < N; ++i)
+     {
+       a[i] = i;
+       b[i] = i * 3;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 111
+ 
+ /* This shouldn't be treated as an over-widening operation: it's better
+    to reuse the extensions of di and ei for di + ei than to add them
+    as shorts and introduce a third extension.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+    unsigned int *restrict c, unsigned char *restrict d,
+    unsigned char *restrict e)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       unsigned int di = d[i];
+       unsigned int ei = e[i];
+       a[i] = di;
+       b[i] = ei;
+       c[i] = di + ei;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned int a[N], b[N], c[N];
+   unsigned char d[N], e[N];
+   for (int i = 0; i < N; ++i)
+     {
+       d[i] = i * 2 + 3;
+       e[i] = i + 100;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d, e);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != i * 2 + 3
+ 	|| b[i] != i + 100
+ 	|| c[i] != i * 3 + 103)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 111
+ 
+ /* This shouldn't be treated as an over-widening operation: it's better
+    to reuse the extensions of di and ei for di + ei than to add them
+    as shorts and introduce a third extension.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+    unsigned int *restrict c, unsigned char *restrict d,
+    unsigned char *restrict e)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       int di = d[i];
+       int ei = e[i];
+       a[i] = di;
+       b[i] = ei;
+       c[i] = di + ei;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned int a[N], b[N], c[N];
+   unsigned char d[N], e[N];
+   for (int i = 0; i < N; ++i)
+     {
+       d[i] = i * 2 + 3;
+       e[i] = i + 100;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d, e);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != i * 2 + 3
+ 	|| b[i] != i + 100
+ 	|| c[i] != i * 3 + 103)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c	2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 111
+ 
+ /* This shouldn't be treated as an over-widening operation: it's better
+    to reuse the extensions of di and ei for di + ei than to add them
+    as shorts and introduce a third extension.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+    unsigned int *restrict c, unsigned char *restrict d,
+    unsigned char *restrict e)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       a[i] = d[i];
+       b[i] = e[i];
+       c[i] = d[i] + e[i];
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned int a[N], b[N], c[N];
+   unsigned char d[N], e[N];
+   for (int i = 0; i < N; ++i)
+     {
+       d[i] = i * 2 + 3;
+       e[i] = i + 100;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d, e);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != i * 2 + 3
+ 	|| b[i] != i + 100
+ 	|| c[i] != i * 3 + 103)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-06-20 10:37 [14/n] PR85694: Rework overwidening detection Richard Sandiford
@ 2018-06-29 12:56 ` Richard Sandiford
  2018-07-02 11:02   ` Christophe Lyon
  2018-07-02 13:12   ` Richard Biener
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Sandiford @ 2018-06-29 12:56 UTC (permalink / raw)
  To: gcc-patches

Richard Sandiford <richard.sandiford@arm.com> writes:
> This patch is the main part of PR85694.  The aim is to recognise at least:
>
>   signed char *a, *b, *c;
>   ...
>   for (int i = 0; i < 2048; i++)
>     c[i] = (a[i] + b[i]) >> 1;
>
> as an over-widening pattern, since the addition and shift can be done
> on shorts rather than ints.  However, it ended up being a lot more
> general than that.
>
> The current over-widening pattern detection is limited to a few simple
> cases: logical ops with immediate second operands, and shifts by a
> constant.  These cases are enough for common pixel-format conversion
> and can be detected in a peephole way.
>
> The loop above requires two generalisations of the current code: support
> for addition as well as logical ops, and support for non-constant second
> operands.  These are harder to detect in the same peephole way, so the
> patch tries to take a more global approach.
>
> The idea is to get information about the minimum operation width
> in two ways:
>
> (1) by using the range information attached to the SSA_NAMEs
>     (effectively a forward walk, since the range info is
>     context-independent).
>
> (2) by back-propagating the number of output bits required by
>     users of the result.
>
> As explained in the comments, there's a balance to be struck between
> narrowing an individual operation and fitting in with the surrounding
> code.  The approach is pretty conservative: if we could narrow an
> operation to N bits without changing its semantics, it's OK to do that if:
>
> - no operations later in the chain require more than N bits; or
>
> - all internally-defined inputs are extended from N bits or fewer,
>   and at least one of them is single-use.
>
> See the comments for the rationale.
>
> I didn't bother adding STMT_VINFO_* wrappers for the new fields
> since the code seemed more readable without.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Here's a version rebased on top of current trunk.  Changes from last time:

- reintroduce dump_generic_expr_loc, with the obvious change to the
  prototype

- fix a typo in a comment

- use vect_element_precision from the new version of 12/n.

Tested as before.  OK to install?

Richard


2018-06-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* poly-int.h (print_hex): New function.
	* dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
	* dumpfile.c (dump_generic_expr): Fix formatting.
	(dump_generic_expr_loc): New function.
	(dump_dec, dump_hex): New poly_wide_int functions.
	* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
	min_input_precision, operation_precision and operation_sign.
	* tree-vect-patterns.c (vect_get_range_info): New function.
	(vect_same_loop_or_bb_p, vect_single_imm_use)
	(vect_operation_fits_smaller_type): Delete.
	(vect_look_through_possible_promotion): Add an optional
	single_use_p parameter.
	(vect_recog_over_widening_pattern): Rewrite to use new
	stmt_vec_info infomration.  Handle one operation at a time.
	(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
	(vect_truncatable_operation_p, vect_set_operation_type)
	(vect_set_min_input_precision): New functions.
	(vect_determine_min_output_precision_1): Likewise.
	(vect_determine_min_output_precision): Likewise.
	(vect_determine_precisions_from_range): Likewise.
	(vect_determine_precisions_from_users): Likewise.
	(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
	(vect_vect_recog_func_ptrs): Put over_widening first.
	Add cast_forwprop.
	(vect_pattern_recog): Call vect_determine_precisions.

gcc/testsuite/
	* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
	over-widening messages.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-5.c: Likewise.
	* gcc.dg/vect/vect-over-widen-6.c: Likewise.
	* gcc.dg/vect/vect-over-widen-7.c: Likewise.
	* gcc.dg/vect/vect-over-widen-8.c: Likewise.
	* gcc.dg/vect/vect-over-widen-9.c: Likewise.
	* gcc.dg/vect/vect-over-widen-10.c: Likewise.
	* gcc.dg/vect/vect-over-widen-11.c: Likewise.
	* gcc.dg/vect/vect-over-widen-12.c: Likewise.
	* gcc.dg/vect/vect-over-widen-13.c: Likewise.
	* gcc.dg/vect/vect-over-widen-14.c: Likewise.
	* gcc.dg/vect/vect-over-widen-15.c: Likewise.
	* gcc.dg/vect/vect-over-widen-16.c: Likewise.
	* gcc.dg/vect/vect-over-widen-17.c: Likewise.
	* gcc.dg/vect/vect-over-widen-18.c: Likewise.
	* gcc.dg/vect/vect-over-widen-19.c: Likewise.
	* gcc.dg/vect/vect-over-widen-20.c: Likewise.
	* gcc.dg/vect/vect-over-widen-21.c: Likewise.

Index: gcc/poly-int.h
===================================================================
*** gcc/poly-int.h	2018-06-29 12:33:06.000000000 +0100
--- gcc/poly-int.h	2018-06-29 12:33:06.721263572 +0100
*************** print_dec (const poly_int_pod<N, C> &val
*** 2420,2425 ****
--- 2420,2444 ----
  	     poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
  }
  
+ /* Use print_hex to print VALUE to FILE.  */
+ 
+ template<unsigned int N, typename C>
+ void
+ print_hex (const poly_int_pod<N, C> &value, FILE *file)
+ {
+   if (value.is_constant ())
+     print_hex (value.coeffs[0], file);
+   else
+     {
+       fprintf (file, "[");
+       for (unsigned int i = 0; i < N; ++i)
+ 	{
+ 	  print_hex (value.coeffs[i], file);
+ 	  fputc (i == N - 1 ? ']' : ',', file);
+ 	}
+     }
+ }
+ 
  /* Helper for calculating the distance between two points P1 and P2,
     in cases where known_le (P1, P2).  T1 and T2 are the types of the
     two positions, in either order.  The coefficients of P2 - P1 have
Index: gcc/dumpfile.h
===================================================================
*** gcc/dumpfile.h	2018-06-29 12:33:06.000000000 +0100
--- gcc/dumpfile.h	2018-06-29 12:33:06.717263602 +0100
*************** extern void dump_printf_loc (dump_flags_
*** 425,430 ****
--- 425,432 ----
  			     const char *, ...) ATTRIBUTE_PRINTF_3;
  extern void dump_function (int phase, tree fn);
  extern void dump_basic_block (dump_flags_t, basic_block, int);
+ extern void dump_generic_expr_loc (dump_flags_t, const dump_location_t &,
+ 				   dump_flags_t, tree);
  extern void dump_generic_expr (dump_flags_t, dump_flags_t, tree);
  extern void dump_gimple_stmt_loc (dump_flags_t, const dump_location_t &,
  				  dump_flags_t, gimple *, int);
*************** extern bool enable_rtl_dump_file (void);
*** 434,439 ****
--- 436,443 ----
  
  template<unsigned int N, typename C>
  void dump_dec (dump_flags_t, const poly_int<N, C> &);
+ extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
+ extern void dump_hex (dump_flags_t, const poly_wide_int &);
  
  /* In tree-dump.c  */
  extern void dump_node (const_tree, dump_flags_t, FILE *);
Index: gcc/dumpfile.c
===================================================================
*** gcc/dumpfile.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/dumpfile.c	2018-06-29 12:33:06.717263602 +0100
*************** dump_generic_expr (dump_flags_t dump_kin
*** 498,507 ****
--- 498,527 ----
  		   tree t)
  {
    if (dump_file && (dump_kind & pflags))
+     print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
+ 
+   if (alt_dump_file && (dump_kind & alt_flags))
+     print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
+ }
+ 
+ /* Similar to dump_generic_expr, except additionally print source location.  */
+ 
+ void
+ dump_generic_expr_loc (dump_flags_t dump_kind, const dump_location_t &loc,
+ 		       dump_flags_t extra_dump_flags, tree t)
+ {
+   location_t srcloc = loc.get_location_t ();
+   if (dump_file && (dump_kind & pflags))
+     {
+       dump_loc (dump_kind, dump_file, srcloc);
        print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
+     }
  
    if (alt_dump_file && (dump_kind & alt_flags))
+     {
+       dump_loc (dump_kind, alt_dump_file, srcloc);
        print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
+     }
  }
  
  /* Output a formatted message using FORMAT on appropriate dump streams.  */
*************** template void dump_dec (dump_flags_t, co
*** 573,578 ****
--- 593,620 ----
  template void dump_dec (dump_flags_t, const poly_offset_int &);
  template void dump_dec (dump_flags_t, const poly_widest_int &);
  
+ void
+ dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
+ {
+   if (dump_file && (dump_kind & pflags))
+     print_dec (value, dump_file, sgn);
+ 
+   if (alt_dump_file && (dump_kind & alt_flags))
+     print_dec (value, alt_dump_file, sgn);
+ }
+ 
+ /* Output VALUE in hexadecimal to appropriate dump streams.  */
+ 
+ void
+ dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
+ {
+   if (dump_file && (dump_kind & pflags))
+     print_hex (value, dump_file);
+ 
+   if (alt_dump_file && (dump_kind & alt_flags))
+     print_hex (value, alt_dump_file);
+ }
+ 
  /* Start a dump for PHASE. Store user-supplied dump flags in
     *FLAG_PTR.  Return the number of streams opened.  Set globals
     DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
Index: gcc/tree-vectorizer.h
===================================================================
*** gcc/tree-vectorizer.h	2018-06-29 12:33:06.000000000 +0100
--- gcc/tree-vectorizer.h	2018-06-29 12:33:06.725263540 +0100
*************** typedef struct _stmt_vec_info {
*** 899,904 ****
--- 899,919 ----
  
    /* The number of scalar stmt references from active SLP instances.  */
    unsigned int num_slp_uses;
+ 
+   /* If nonzero, the lhs of the statement could be truncated to this
+      many bits without affecting any users of the result.  */
+   unsigned int min_output_precision;
+ 
+   /* If nonzero, all non-boolean input operands have the same precision,
+      and they could each be truncated to this many bits without changing
+      the result.  */
+   unsigned int min_input_precision;
+ 
+   /* If OPERATION_BITS is nonzero, the statement could be performed on
+      an integer with the sign and number of bits given by OPERATION_SIGN
+      and OPERATION_BITS without changing the result.  */
+   unsigned int operation_precision;
+   signop operation_sign;
  } *stmt_vec_info;
  
  /* Information about a gather/scatter call.  */
Index: gcc/tree-vect-patterns.c
===================================================================
*** gcc/tree-vect-patterns.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/tree-vect-patterns.c	2018-06-29 12:33:06.721263572 +0100
*************** Software Foundation; either version 3, o
*** 47,52 ****
--- 47,86 ----
  #include "omp-simd-clone.h"
  #include "predict.h"
  
+ /* Return true if we have a useful VR_RANGE range for VAR, storing it
+    in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
+ 
+ static bool
+ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
+ {
+   value_range_type vr_type = get_range_info (var, min_value, max_value);
+   wide_int nonzero = get_nonzero_bits (var);
+   signop sgn = TYPE_SIGN (TREE_TYPE (var));
+   if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
+ 					 nonzero, sgn) == VR_RANGE)
+     {
+       if (dump_enabled_p ())
+ 	{
+ 	  dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ 	  dump_printf (MSG_NOTE, " has range [");
+ 	  dump_hex (MSG_NOTE, *min_value);
+ 	  dump_printf (MSG_NOTE, ", ");
+ 	  dump_hex (MSG_NOTE, *max_value);
+ 	  dump_printf (MSG_NOTE, "]\n");
+ 	}
+       return true;
+     }
+   else
+     {
+       if (dump_enabled_p ())
+ 	{
+ 	  dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ 	  dump_printf (MSG_NOTE, " has no range info\n");
+ 	}
+       return false;
+     }
+ }
+ 
  /* Report that we've found an instance of pattern PATTERN in
     statement STMT.  */
  
*************** vect_supportable_direct_optab_p (tree ot
*** 190,229 ****
    return true;
  }
  
- /* Check whether STMT2 is in the same loop or basic block as STMT1.
-    Which of the two applies depends on whether we're currently doing
-    loop-based or basic-block-based vectorization, as determined by
-    the vinfo_for_stmt for STMT1 (which must be defined).
- 
-    If this returns true, vinfo_for_stmt for STMT2 is guaranteed
-    to be defined as well.  */
- 
- static bool
- vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
- {
-   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
-   return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
- }
- 
- /* If the LHS of DEF_STMT has a single use, and that statement is
-    in the same loop or basic block, return it.  */
- 
- static gimple *
- vect_single_imm_use (gimple *def_stmt)
- {
-   tree lhs = gimple_assign_lhs (def_stmt);
-   use_operand_p use_p;
-   gimple *use_stmt;
- 
-   if (!single_imm_use (lhs, &use_p, &use_stmt))
-     return NULL;
- 
-   if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
-     return NULL;
- 
-   return use_stmt;
- }
- 
  /* Round bit precision PRECISION up to a full element.  */
  
  static unsigned int
--- 224,229 ----
*************** vect_unpromoted_value::set_op (tree op_i
*** 347,353 ****
     is possible to convert OP' back to OP using a possible sign change
     followed by a possible promotion P.  Return this OP', or null if OP is
     not a vectorizable SSA name.  If there is a promotion P, describe its
!    input in UNPROM, otherwise describe OP' in UNPROM.
  
     A successful return means that it is possible to go from OP' to OP
     via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
--- 347,355 ----
     is possible to convert OP' back to OP using a possible sign change
     followed by a possible promotion P.  Return this OP', or null if OP is
     not a vectorizable SSA name.  If there is a promotion P, describe its
!    input in UNPROM, otherwise describe OP' in UNPROM.  If SINGLE_USE_P
!    is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
!    have more than one user.
  
     A successful return means that it is possible to go from OP' to OP
     via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
*************** vect_unpromoted_value::set_op (tree op_i
*** 374,380 ****
  
  static tree
  vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! 				      vect_unpromoted_value *unprom)
  {
    tree res = NULL_TREE;
    tree op_type = TREE_TYPE (op);
--- 376,383 ----
  
  static tree
  vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! 				      vect_unpromoted_value *unprom,
! 				      bool *single_use_p = NULL)
  {
    tree res = NULL_TREE;
    tree op_type = TREE_TYPE (op);
*************** vect_look_through_possible_promotion (ve
*** 420,426 ****
        if (!def_stmt)
  	break;
        if (dt == vect_internal_def)
! 	caster = vinfo_for_stmt (def_stmt);
        else
  	caster = NULL;
        gassign *assign = dyn_cast <gassign *> (def_stmt);
--- 423,436 ----
        if (!def_stmt)
  	break;
        if (dt == vect_internal_def)
! 	{
! 	  caster = vinfo_for_stmt (def_stmt);
! 	  /* Ignore pattern statements, since we don't link uses for them.  */
! 	  if (single_use_p
! 	      && !STMT_VINFO_RELATED_STMT (caster)
! 	      && !has_single_use (res))
! 	    *single_use_p = false;
! 	}
        else
  	caster = NULL;
        gassign *assign = dyn_cast <gassign *> (def_stmt);
*************** vect_recog_widen_sum_pattern (vec<gimple
*** 1371,1733 ****
    return pattern_stmt;
  }
  
  
! /* Return TRUE if the operation in STMT can be performed on a smaller type.
  
!    Input:
!    STMT - a statement to check.
!    DEF - we support operations with two operands, one of which is constant.
!          The other operand can be defined by a demotion operation, or by a
!          previous statement in a sequence of over-promoted operations.  In the
!          later case DEF is used to replace that operand.  (It is defined by a
!          pattern statement we created for the previous statement in the
!          sequence).
! 
!    Input/output:
!    NEW_TYPE - Output: a smaller type that we are trying to use.  Input: if not
!          NULL, it's the type of DEF.
!    STMTS - additional pattern statements.  If a pattern statement (type
!          conversion) is created in this function, its original statement is
!          added to STMTS.
  
!    Output:
!    OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
!          operands to use in the new pattern statement for STMT (will be created
!          in vect_recog_over_widening_pattern ()).
!    NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
!          statements for STMT: the first one is a type promotion and the second
!          one is the operation itself.  We return the type promotion statement
! 	 in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
!          the second pattern statement.  */
  
! static bool
! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
! 				  tree *op0, tree *op1, gimple **new_def_stmt,
! 				  vec<gimple *> *stmts)
! {
!   enum tree_code code;
!   tree const_oprnd, oprnd;
!   tree interm_type = NULL_TREE, half_type, new_oprnd, type;
!   gimple *def_stmt, *new_stmt;
!   bool first = false;
!   bool promotion;
  
!   *op0 = NULL_TREE;
!   *op1 = NULL_TREE;
!   *new_def_stmt = NULL;
  
!   if (!is_gimple_assign (stmt))
!     return false;
  
!   code = gimple_assign_rhs_code (stmt);
!   if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
!       && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
!     return false;
  
!   oprnd = gimple_assign_rhs1 (stmt);
!   const_oprnd = gimple_assign_rhs2 (stmt);
!   type = gimple_expr_type (stmt);
  
!   if (TREE_CODE (oprnd) != SSA_NAME
!       || TREE_CODE (const_oprnd) != INTEGER_CST)
!     return false;
  
!   /* If oprnd has other uses besides that in stmt we cannot mark it
!      as being part of a pattern only.  */
!   if (!has_single_use (oprnd))
!     return false;
  
!   /* If we are in the middle of a sequence, we use DEF from a previous
!      statement.  Otherwise, OPRND has to be a result of type promotion.  */
!   if (*new_type)
!     {
!       half_type = *new_type;
!       oprnd = def;
!     }
!   else
      {
!       first = true;
!       if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
! 			      &promotion)
! 	  || !promotion
! 	  || !vect_same_loop_or_bb_p (stmt, def_stmt))
!         return false;
      }
  
!   /* Can we perform the operation on a smaller type?  */
!   switch (code)
!     {
!       case BIT_IOR_EXPR:
!       case BIT_XOR_EXPR:
!       case BIT_AND_EXPR:
!         if (!int_fits_type_p (const_oprnd, half_type))
!           {
!             /* HALF_TYPE is not enough.  Try a bigger type if possible.  */
!             if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
!               return false;
! 
!             interm_type = build_nonstandard_integer_type (
!                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
!             if (!int_fits_type_p (const_oprnd, interm_type))
!               return false;
!           }
! 
!         break;
! 
!       case LSHIFT_EXPR:
!         /* Try intermediate type - HALF_TYPE is not enough for sure.  */
!         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
!           return false;
! 
!         /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
!           (e.g., if the original value was char, the shift amount is at most 8
!            if we want to use short).  */
!         if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
!           return false;
! 
!         interm_type = build_nonstandard_integer_type (
!                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
! 
!         if (!vect_supportable_shift (code, interm_type))
!           return false;
! 
!         break;
! 
!       case RSHIFT_EXPR:
!         if (vect_supportable_shift (code, half_type))
!           break;
! 
!         /* Try intermediate type - HALF_TYPE is not supported.  */
!         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
!           return false;
! 
!         interm_type = build_nonstandard_integer_type (
!                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
! 
!         if (!vect_supportable_shift (code, interm_type))
!           return false;
! 
!         break;
! 
!       default:
!         gcc_unreachable ();
!     }
! 
!   /* There are four possible cases:
!      1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
!         the first statement in the sequence)
!         a. The original, HALF_TYPE, is not enough - we replace the promotion
!            from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
!         b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
!            promotion.
!      2. OPRND is defined by a pattern statement we created.
!         a. Its type is not sufficient for the operation, we create a new stmt:
!            a type conversion for OPRND from HALF_TYPE to INTERM_TYPE.  We store
!            this statement in NEW_DEF_STMT, and it is later put in
! 	   STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
!         b. OPRND is good to use in the new statement.  */
!   if (first)
!     {
!       if (interm_type)
!         {
!           /* Replace the original type conversion HALF_TYPE->TYPE with
!              HALF_TYPE->INTERM_TYPE.  */
!           if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
!             {
!               new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
!               /* Check if the already created pattern stmt is what we need.  */
!               if (!is_gimple_assign (new_stmt)
!                   || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
!                   || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
!                 return false;
! 
! 	      stmts->safe_push (def_stmt);
!               oprnd = gimple_assign_lhs (new_stmt);
!             }
!           else
!             {
!               /* Create NEW_OPRND = (INTERM_TYPE) OPRND.  */
!               oprnd = gimple_assign_rhs1 (def_stmt);
! 	      new_oprnd = make_ssa_name (interm_type);
! 	      new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
!               STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
!               stmts->safe_push (def_stmt);
!               oprnd = new_oprnd;
!             }
!         }
!       else
!         {
!           /* Retrieve the operand before the type promotion.  */
!           oprnd = gimple_assign_rhs1 (def_stmt);
!         }
!     }
!   else
!     {
!       if (interm_type)
!         {
!           /* Create a type conversion HALF_TYPE->INTERM_TYPE.  */
! 	  new_oprnd = make_ssa_name (interm_type);
! 	  new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
!           oprnd = new_oprnd;
!           *new_def_stmt = new_stmt;
!         }
  
!       /* Otherwise, OPRND is already set.  */
      }
  
!   if (interm_type)
!     *new_type = interm_type;
!   else
!     *new_type = half_type;
  
!   *op0 = oprnd;
!   *op1 = fold_convert (*new_type, const_oprnd);
! 
!   return true;
  }
  
  
! /* Try to find a statement or a sequence of statements that can be performed
!    on a smaller type:
  
!      type x_t;
!      TYPE x_T, res0_T, res1_T;
!    loop:
!      S1  x_t = *p;
!      S2  x_T = (TYPE) x_t;
!      S3  res0_T = op (x_T, C0);
!      S4  res1_T = op (res0_T, C1);
!      S5  ... = () res1_T;  - type demotion
! 
!    where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
!    constants.
!    Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
!    be 'type' or some intermediate type.  For now, we expect S5 to be a type
!    demotion operation.  We also check that S3 and S4 have only one use.  */
  
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
!   gimple *stmt = stmts->pop ();
!   gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
! 	 *use_stmt = NULL;
!   tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
!   tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
!   bool first;
!   tree type = NULL;
! 
!   first = true;
!   while (1)
!     {
!       if (!vinfo_for_stmt (stmt)
!           || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
!         return NULL;
! 
!       new_def_stmt = NULL;
!       if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
!                                              &op0, &op1, &new_def_stmt,
!                                              stmts))
!         {
!           if (first)
!             return NULL;
!           else
!             break;
!         }
  
!       /* STMT can be performed on a smaller type.  Check its uses.  */
!       use_stmt = vect_single_imm_use (stmt);
!       if (!use_stmt || !is_gimple_assign (use_stmt))
!         return NULL;
! 
!       /* Create pattern statement for STMT.  */
!       vectype = get_vectype_for_scalar_type (new_type);
!       if (!vectype)
!         return NULL;
! 
!       /* We want to collect all the statements for which we create pattern
!          statetments, except for the case when the last statement in the
!          sequence doesn't have a corresponding pattern statement.  In such
!          case we associate the last pattern statement with the last statement
!          in the sequence.  Therefore, we only add the original statement to
!          the list if we know that it is not the last.  */
!       if (prev_stmt)
!         stmts->safe_push (prev_stmt);
  
!       var = vect_recog_temp_ssa_var (new_type, NULL);
!       pattern_stmt
! 	= gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
!       STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
!       new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
  
!       if (dump_enabled_p ())
!         {
!           dump_printf_loc (MSG_NOTE, vect_location,
!                            "created pattern stmt: ");
!           dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
!         }
  
!       type = gimple_expr_type (stmt);
!       prev_stmt = stmt;
!       stmt = use_stmt;
! 
!       first = false;
!     }
! 
!   /* We got a sequence.  We expect it to end with a type demotion operation.
!      Otherwise, we quit (for now).  There are three possible cases: the
!      conversion is to NEW_TYPE (we don't do anything), the conversion is to
!      a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
!      NEW_TYPE differs (we create a new conversion statement).  */
!   if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
!     {
!       use_lhs = gimple_assign_lhs (use_stmt);
!       use_type = TREE_TYPE (use_lhs);
!       /* Support only type demotion or signedess change.  */
!       if (!INTEGRAL_TYPE_P (use_type)
! 	  || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
!         return NULL;
  
!       /* Check that NEW_TYPE is not bigger than the conversion result.  */
!       if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
! 	return NULL;
  
!       if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
!           || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
!         {
! 	  *type_out = get_vectype_for_scalar_type (use_type);
! 	  if (!*type_out)
! 	    return NULL;
  
!           /* Create NEW_TYPE->USE_TYPE conversion.  */
! 	  new_oprnd = make_ssa_name (use_type);
! 	  pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
!           STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
! 
!           /* We created a pattern statement for the last statement in the
!              sequence, so we don't need to associate it with the pattern
!              statement created for PREV_STMT.  Therefore, we add PREV_STMT
!              to the list in order to mark it later in vect_pattern_recog_1.  */
!           if (prev_stmt)
!             stmts->safe_push (prev_stmt);
!         }
!       else
!         {
!           if (prev_stmt)
! 	    STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
! 	       = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
  
! 	  *type_out = vectype;
!         }
  
!       stmts->safe_push (use_stmt);
!     }
!   else
!     /* TODO: support general case, create a conversion to the correct type.  */
      return NULL;
  
!   /* Pattern detected.  */
!   vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
  
    return pattern_stmt;
  }
  
--- 1381,1698 ----
    return pattern_stmt;
  }
  
+ /* Recognize cases in which an operation is performed in one type WTYPE
+    but could be done more efficiently in a narrower type NTYPE.  For example,
+    if we have:
+ 
+      ATYPE a;  // narrower than NTYPE
+      BTYPE b;  // narrower than NTYPE
+      WTYPE aw = (WTYPE) a;
+      WTYPE bw = (WTYPE) b;
+      WTYPE res = aw + bw;  // only uses of aw and bw
+ 
+    then it would be more efficient to do:
+ 
+      NTYPE an = (NTYPE) a;
+      NTYPE bn = (NTYPE) b;
+      NTYPE resn = an + bn;
+      WTYPE res = (WTYPE) resn;
+ 
+    Other situations include things like:
+ 
+      ATYPE a;  // NTYPE or narrower
+      WTYPE aw = (WTYPE) a;
+      WTYPE res = aw + b;
+ 
+    when only "(NTYPE) res" is significant.  In that case it's more efficient
+    to truncate "b" and do the operation on NTYPE instead:
+ 
+      NTYPE an = (NTYPE) a;
+      NTYPE bn = (NTYPE) b;  // truncation
+      NTYPE resn = an + bn;
+      WTYPE res = (WTYPE) resn;
+ 
+    All users of "res" should then use "resn" instead, making the final
+    statement dead (not marked as relevant).  The final statement is still
+    needed to maintain the type correctness of the IR.
+ 
+    vect_determine_precisions has already determined the minimum
+    precison of the operation and the minimum precision required
+    by users of the result.  */
  
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
!   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
!   if (!last_stmt)
!     return NULL;
  
!   /* See whether we have found that this operation can be done on a
!      narrower type without changing its semantics.  */
!   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
!   unsigned int new_precision = last_stmt_info->operation_precision;
!   if (!new_precision)
!     return NULL;
  
!   vec_info *vinfo = last_stmt_info->vinfo;
!   tree lhs = gimple_assign_lhs (last_stmt);
!   tree type = TREE_TYPE (lhs);
!   tree_code code = gimple_assign_rhs_code (last_stmt);
! 
!   /* Keep the first operand of a COND_EXPR as-is: only the other two
!      operands are interesting.  */
!   unsigned int first_op = (code == COND_EXPR ? 2 : 1);
  
!   /* Check the operands.  */
!   unsigned int nops = gimple_num_ops (last_stmt) - first_op;
!   auto_vec <vect_unpromoted_value, 3> unprom (nops);
!   unprom.quick_grow (nops);
!   unsigned int min_precision = 0;
!   bool single_use_p = false;
!   for (unsigned int i = 0; i < nops; ++i)
!     {
!       tree op = gimple_op (last_stmt, first_op + i);
!       if (TREE_CODE (op) == INTEGER_CST)
! 	unprom[i].set_op (op, vect_constant_def);
!       else if (TREE_CODE (op) == SSA_NAME)
! 	{
! 	  bool op_single_use_p = true;
! 	  if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
! 						     &op_single_use_p))
! 	    return NULL;
! 	  /* If:
  
! 	     (1) N bits of the result are needed;
! 	     (2) all inputs are widened from M<N bits; and
! 	     (3) one operand OP is a single-use SSA name
! 
! 	     we can shift the M->N widening from OP to the output
! 	     without changing the number or type of extensions involved.
! 	     This then reduces the number of copies of STMT_INFO.
! 
! 	     If instead of (3) more than one operand is a single-use SSA name,
! 	     shifting the extension to the output is even more of a win.
! 
! 	     If instead:
! 
! 	     (1) N bits of the result are needed;
! 	     (2) one operand OP2 is widened from M2<N bits;
! 	     (3) another operand OP1 is widened from M1<M2 bits; and
! 	     (4) both OP1 and OP2 are single-use
! 
! 	     the choice is between:
! 
! 	     (a) truncating OP2 to M1, doing the operation on M1,
! 		 and then widening the result to N
! 
! 	     (b) widening OP1 to M2, doing the operation on M2, and then
! 		 widening the result to N
! 
! 	     Both shift the M2->N widening of the inputs to the output.
! 	     (a) additionally shifts the M1->M2 widening to the output;
! 	     it requires fewer copies of STMT_INFO but requires an extra
! 	     M2->M1 truncation.
! 
! 	     Which is better will depend on the complexity and cost of
! 	     STMT_INFO, which is hard to predict at this stage.  However,
! 	     a clear tie-breaker in favor of (b) is the fact that the
! 	     truncation in (a) increases the length of the operation chain.
! 
! 	     If instead of (4) only one of OP1 or OP2 is single-use,
! 	     (b) is still a win over doing the operation in N bits:
! 	     it still shifts the M2->N widening on the single-use operand
! 	     to the output and reduces the number of STMT_INFO copies.
! 
! 	     If neither operand is single-use then operating on fewer than
! 	     N bits might lead to more extensions overall.  Whether it does
! 	     or not depends on global information about the vectorization
! 	     region, and whether that's a good trade-off would again
! 	     depend on the complexity and cost of the statements involved,
! 	     as well as things like register pressure that are not normally
! 	     modelled at this stage.  We therefore ignore these cases
! 	     and just optimize the clear single-use wins above.
! 
! 	     Thus we take the maximum precision of the unpromoted operands
! 	     and record whether any operand is single-use.  */
! 	  if (unprom[i].dt == vect_internal_def)
! 	    {
! 	      min_precision = MAX (min_precision,
! 				   TYPE_PRECISION (unprom[i].type));
! 	      single_use_p |= op_single_use_p;
! 	    }
! 	}
!     }
  
!   /* Although the operation could be done in operation_precision, we have
!      to balance that against introducing extra truncations or extensions.
!      Calculate the minimum precision that can be handled efficiently.
! 
!      The loop above determined that the operation could be handled
!      efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
!      extension from the inputs to the output without introducing more
!      instructions, and would reduce the number of instructions required
!      for STMT_INFO itself.
! 
!      vect_determine_precisions has also determined that the result only
!      needs min_output_precision bits.  Truncating by a factor of N times
!      requires a tree of N - 1 instructions, so if TYPE is N times wider
!      than min_output_precision, doing the operation in TYPE and truncating
!      the result requires N + (N - 1) = 2N - 1 instructions per output vector.
!      In contrast:
! 
!      - truncating the input to a unary operation and doing the operation
!        in the new type requires at most N - 1 + 1 = N instructions per
!        output vector
! 
!      - doing the same for a binary operation requires at most
!        (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
! 
!      Both unary and binary operations require fewer instructions than
!      this if the operands were extended from a suitable truncated form.
!      Thus there is usually nothing to lose by doing operations in
!      min_output_precision bits, but there can be something to gain.  */
!   if (!single_use_p)
!     min_precision = last_stmt_info->min_output_precision;
!   else
!     min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
  
!   /* Apply the minimum efficient precision we just calculated.  */
!   if (new_precision < min_precision)
!     new_precision = min_precision;
!   if (new_precision >= TYPE_PRECISION (type))
!     return NULL;
  
!   vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
  
!   *type_out = get_vectype_for_scalar_type (type);
!   if (!*type_out)
!     return NULL;
  
!   /* We've found a viable pattern.  Get the new type of the operation.  */
!   bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
!   tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
! 
!   /* We specifically don't check here whether the target supports the
!      new operation, since it might be something that a later pattern
!      wants to rewrite anyway.  If targets have a minimum element size
!      for some optabs, we should pattern-match smaller ops to larger ops
!      where beneficial.  */
!   tree new_vectype = get_vectype_for_scalar_type (new_type);
!   if (!new_vectype)
!     return NULL;
  
!   if (dump_enabled_p ())
      {
!       dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
!       dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
!       dump_printf (MSG_NOTE, " to ");
!       dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
!       dump_printf (MSG_NOTE, "\n");
      }
  
!   /* Calculate the rhs operands for an operation on NEW_TYPE.  */
!   STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
!   tree ops[3] = {};
!   for (unsigned int i = 1; i < first_op; ++i)
!     ops[i - 1] = gimple_op (last_stmt, i);
!   vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
! 		       new_type, &unprom[0], new_vectype);
! 
!   /* Use the operation to produce a result of type NEW_TYPE.  */
!   tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
!   gimple *pattern_stmt = gimple_build_assign (new_var, code,
! 					      ops[0], ops[1], ops[2]);
!   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
  
!   if (dump_enabled_p ())
!     {
!       dump_printf_loc (MSG_NOTE, vect_location,
! 		       "created pattern stmt: ");
!       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
      }
  
!   pattern_stmt = vect_convert_output (last_stmt_info, type,
! 				      pattern_stmt, new_vectype);
  
!   stmts->safe_push (last_stmt);
!   return pattern_stmt;
  }
  
+ /* Recognize cases in which the input to a cast is wider than its
+    output, and the input is fed by a widening operation.  Fold this
+    by removing the unnecessary intermediate widening.  E.g.:
  
!      unsigned char a;
!      unsigned int b = (unsigned int) a;
!      unsigned short c = (unsigned short) b;
  
!    -->
  
!      unsigned short c = (unsigned short) a;
  
!    Although this is rare in input IR, it is an expected side-effect
!    of the over-widening pattern above.
  
!    This is beneficial also for integer-to-float conversions, if the
!    widened integer has more bits than the float, and if the unwidened
!    input doesn't.  */
  
! static gimple *
! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
! {
!   /* Check for a cast, including an integer-to-float conversion.  */
!   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
!   if (!last_stmt)
!     return NULL;
!   tree_code code = gimple_assign_rhs_code (last_stmt);
!   if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
!     return NULL;
  
!   /* Make sure that the rhs is a scalar with a natural bitsize.  */
!   tree lhs = gimple_assign_lhs (last_stmt);
!   if (!lhs)
!     return NULL;
!   tree lhs_type = TREE_TYPE (lhs);
!   scalar_mode lhs_mode;
!   if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
!       || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
!     return NULL;
  
!   /* Check for a narrowing operation (from a vector point of view).  */
!   tree rhs = gimple_assign_rhs1 (last_stmt);
!   tree rhs_type = TREE_TYPE (rhs);
!   if (!INTEGRAL_TYPE_P (rhs_type)
!       || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
!       || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
!     return NULL;
  
!   /* Try to find an unpromoted input.  */
!   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
!   vec_info *vinfo = last_stmt_info->vinfo;
!   vect_unpromoted_value unprom;
!   if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
!       || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
!     return NULL;
  
!   /* If the bits above RHS_TYPE matter, make sure that they're the
!      same when extending from UNPROM as they are when extending from RHS.  */
!   if (!INTEGRAL_TYPE_P (lhs_type)
!       && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
!     return NULL;
  
!   /* We can get the same result by casting UNPROM directly, to avoid
!      the unnecessary widening and narrowing.  */
!   vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
  
!   *type_out = get_vectype_for_scalar_type (lhs_type);
!   if (!*type_out)
      return NULL;
  
!   tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
!   gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
!   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
  
+   stmts->safe_push (last_stmt);
    return pattern_stmt;
  }
  
*************** vect_recog_gather_scatter_pattern (vec<g
*** 4205,4210 ****
--- 4170,4559 ----
    return pattern_stmt;
  }
  
+ /* Return true if TYPE is a non-boolean integer type.  These are the types
+    that we want to consider for narrowing.  */
+ 
+ static bool
+ vect_narrowable_type_p (tree type)
+ {
+   return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
+ }
+ 
+ /* Return true if the operation given by CODE can be truncated to N bits
+    when only N bits of the output are needed.  This is only true if bit N+1
+    of the inputs has no effect on the low N bits of the result.  */
+ 
+ static bool
+ vect_truncatable_operation_p (tree_code code)
+ {
+   switch (code)
+     {
+     case PLUS_EXPR:
+     case MINUS_EXPR:
+     case MULT_EXPR:
+     case BIT_AND_EXPR:
+     case BIT_IOR_EXPR:
+     case BIT_XOR_EXPR:
+     case COND_EXPR:
+       return true;
+ 
+     default:
+       return false;
+     }
+ }
+ 
+ /* Record that STMT_INFO could be changed from operating on TYPE to
+    operating on a type with the precision and sign given by PRECISION
+    and SIGN respectively.  PRECISION is an arbitrary bit precision;
+    it might not be a whole number of bytes.  */
+ 
+ static void
+ vect_set_operation_type (stmt_vec_info stmt_info, tree type,
+ 			 unsigned int precision, signop sign)
+ {
+   /* Round the precision up to a whole number of bytes.  */
+   precision = vect_element_precision (precision);
+   if (precision < TYPE_PRECISION (type)
+       && (!stmt_info->operation_precision
+ 	  || stmt_info->operation_precision > precision))
+     {
+       stmt_info->operation_precision = precision;
+       stmt_info->operation_sign = sign;
+     }
+ }
+ 
+ /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
+    non-boolean inputs, all of which have type TYPE.  MIN_INPUT_PRECISION
+    is an arbitrary bit precision; it might not be a whole number of bytes.  */
+ 
+ static void
+ vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
+ 			      unsigned int min_input_precision)
+ {
+   /* This operation in isolation only requires the inputs to have
+      MIN_INPUT_PRECISION of precision,  However, that doesn't mean
+      that MIN_INPUT_PRECISION is a natural precision for the chain
+      as a whole.  E.g. consider something like:
+ 
+ 	 unsigned short *x, *y;
+ 	 *y = ((*x & 0xf0) >> 4) | (*y << 4);
+ 
+      The right shift can be done on unsigned chars, and only requires the
+      result of "*x & 0xf0" to be done on unsigned chars.  But taking that
+      approach would mean turning a natural chain of single-vector unsigned
+      short operations into one that truncates "*x" and then extends
+      "(*x & 0xf0) >> 4", with two vectors for each unsigned short
+      operation and one vector for each unsigned char operation.
+      This would be a significant pessimization.
+ 
+      Instead only propagate the maximum of this precision and the precision
+      required by the users of the result.  This means that we don't pessimize
+      the case above but continue to optimize things like:
+ 
+ 	 unsigned char *y;
+ 	 unsigned short *x;
+ 	 *y = ((*x & 0xf0) >> 4) | (*y << 4);
+ 
+      Here we would truncate two vectors of *x to a single vector of
+      unsigned chars and use single-vector unsigned char operations for
+      everything else, rather than doing two unsigned short copies of
+      "(*x & 0xf0) >> 4" and then truncating the result.  */
+   min_input_precision = MAX (min_input_precision,
+ 			     stmt_info->min_output_precision);
+ 
+   if (min_input_precision < TYPE_PRECISION (type)
+       && (!stmt_info->min_input_precision
+ 	  || stmt_info->min_input_precision > min_input_precision))
+     stmt_info->min_input_precision = min_input_precision;
+ }
+ 
+ /* Subroutine of vect_determine_min_output_precision.  Return true if
+    we can calculate a reduced number of output bits for STMT_INFO,
+    whose result is LHS.  */
+ 
+ static bool
+ vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
+ {
+   /* Take the maximum precision required by users of the result.  */
+   unsigned int precision = 0;
+   imm_use_iterator iter;
+   use_operand_p use;
+   FOR_EACH_IMM_USE_FAST (use, iter, lhs)
+     {
+       gimple *use_stmt = USE_STMT (use);
+       if (is_gimple_debug (use_stmt))
+ 	continue;
+       if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
+ 	return false;
+       stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
+       if (!use_stmt_info->min_input_precision)
+ 	return false;
+       precision = MAX (precision, use_stmt_info->min_input_precision);
+     }
+ 
+   if (dump_enabled_p ())
+     {
+       dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
+ 		       precision);
+       dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
+       dump_printf (MSG_NOTE, " are significant\n");
+     }
+   stmt_info->min_output_precision = precision;
+   return true;
+ }
+ 
+ /* Calculate min_output_precision for STMT_INFO.  */
+ 
+ static void
+ vect_determine_min_output_precision (stmt_vec_info stmt_info)
+ {
+   /* We're only interested in statements with a narrowable result.  */
+   tree lhs = gimple_get_lhs (stmt_info->stmt);
+   if (!lhs
+       || TREE_CODE (lhs) != SSA_NAME
+       || !vect_narrowable_type_p (TREE_TYPE (lhs)))
+     return;
+ 
+   if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
+     stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
+ }
+ 
+ /* Use range information to decide whether STMT (described by STMT_INFO)
+    could be done in a narrower type.  This is effectively a forward
+    propagation, since it uses context-independent information that applies
+    to all users of an SSA name.  */
+ 
+ static void
+ vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
+ {
+   tree lhs = gimple_assign_lhs (stmt);
+   if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+     return;
+ 
+   tree type = TREE_TYPE (lhs);
+   if (!vect_narrowable_type_p (type))
+     return;
+ 
+   /* First see whether we have any useful range information for the result.  */
+   unsigned int precision = TYPE_PRECISION (type);
+   signop sign = TYPE_SIGN (type);
+   wide_int min_value, max_value;
+   if (!vect_get_range_info (lhs, &min_value, &max_value))
+     return;
+ 
+   tree_code code = gimple_assign_rhs_code (stmt);
+   unsigned int nops = gimple_num_ops (stmt);
+ 
+   if (!vect_truncatable_operation_p (code))
+     /* Check that all relevant input operands are compatible, and update
+        [MIN_VALUE, MAX_VALUE] to include their ranges.  */
+     for (unsigned int i = 1; i < nops; ++i)
+       {
+ 	tree op = gimple_op (stmt, i);
+ 	if (TREE_CODE (op) == INTEGER_CST)
+ 	  {
+ 	    /* Don't require the integer to have RHS_TYPE (which it might
+ 	       not for things like shift amounts, etc.), but do require it
+ 	       to fit the type.  */
+ 	    if (!int_fits_type_p (op, type))
+ 	      return;
+ 
+ 	    min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
+ 	    max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
+ 	  }
+ 	else if (TREE_CODE (op) == SSA_NAME)
+ 	  {
+ 	    /* Ignore codes that don't take uniform arguments.  */
+ 	    if (!types_compatible_p (TREE_TYPE (op), type))
+ 	      return;
+ 
+ 	    wide_int op_min_value, op_max_value;
+ 	    if (!vect_get_range_info (op, &op_min_value, &op_max_value))
+ 	      return;
+ 
+ 	    min_value = wi::min (min_value, op_min_value, sign);
+ 	    max_value = wi::max (max_value, op_max_value, sign);
+ 	  }
+ 	else
+ 	  return;
+       }
+ 
+   /* Try to switch signed types for unsigned types if we can.
+      This is better for two reasons.  First, unsigned ops tend
+      to be cheaper than signed ops.  Second, it means that we can
+      handle things like:
+ 
+ 	signed char c;
+ 	int res = (int) c & 0xff00; // range [0x0000, 0xff00]
+ 
+      as:
+ 
+ 	signed char c;
+ 	unsigned short res_1 = (unsigned short) c & 0xff00;
+ 	int res = (int) res_1;
+ 
+      where the intermediate result res_1 has unsigned rather than
+      signed type.  */
+   if (sign == SIGNED && !wi::neg_p (min_value))
+     sign = UNSIGNED;
+ 
+   /* See what precision is required for MIN_VALUE and MAX_VALUE.  */
+   unsigned int precision1 = wi::min_precision (min_value, sign);
+   unsigned int precision2 = wi::min_precision (max_value, sign);
+   unsigned int value_precision = MAX (precision1, precision2);
+   if (value_precision >= precision)
+     return;
+ 
+   if (dump_enabled_p ())
+     {
+       dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ 		       " without loss of precision: ",
+ 		       sign == SIGNED ? "signed" : "unsigned",
+ 		       value_precision);
+       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+     }
+ 
+   vect_set_operation_type (stmt_info, type, value_precision, sign);
+   vect_set_min_input_precision (stmt_info, type, value_precision);
+ }
+ 
+ /* Use information about the users of STMT's result to decide whether
+    STMT (described by STMT_INFO) could be done in a narrower type.
+    This is effectively a backward propagation.  */
+ 
+ static void
+ vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
+ {
+   tree_code code = gimple_assign_rhs_code (stmt);
+   unsigned int opno = (code == COND_EXPR ? 2 : 1);
+   tree type = TREE_TYPE (gimple_op (stmt, opno));
+   if (!vect_narrowable_type_p (type))
+     return;
+ 
+   unsigned int precision = TYPE_PRECISION (type);
+   unsigned int operation_precision, min_input_precision;
+   switch (code)
+     {
+     CASE_CONVERT:
+       /* Only the bits that contribute to the output matter.  Don't change
+ 	 the precision of the operation itself.  */
+       operation_precision = precision;
+       min_input_precision = stmt_info->min_output_precision;
+       break;
+ 
+     case LSHIFT_EXPR:
+     case RSHIFT_EXPR:
+       {
+ 	tree shift = gimple_assign_rhs2 (stmt);
+ 	if (TREE_CODE (shift) != INTEGER_CST
+ 	    || !wi::ltu_p (wi::to_widest (shift), precision))
+ 	  return;
+ 	unsigned int const_shift = TREE_INT_CST_LOW (shift);
+ 	if (code == LSHIFT_EXPR)
+ 	  {
+ 	    /* We need CONST_SHIFT fewer bits of the input.  */
+ 	    operation_precision = stmt_info->min_output_precision;
+ 	    min_input_precision = (MAX (operation_precision, const_shift)
+ 				    - const_shift);
+ 	  }
+ 	else
+ 	  {
+ 	    /* We need CONST_SHIFT extra bits to do the operation.  */
+ 	    operation_precision = (stmt_info->min_output_precision
+ 				   + const_shift);
+ 	    min_input_precision = operation_precision;
+ 	  }
+ 	break;
+       }
+ 
+     default:
+       if (vect_truncatable_operation_p (code))
+ 	{
+ 	  /* Input bit N has no effect on output bits N-1 and lower.  */
+ 	  operation_precision = stmt_info->min_output_precision;
+ 	  min_input_precision = operation_precision;
+ 	  break;
+ 	}
+       return;
+     }
+ 
+   if (operation_precision < precision)
+     {
+       if (dump_enabled_p ())
+ 	{
+ 	  dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ 			   " without affecting users: ",
+ 			   TYPE_UNSIGNED (type) ? "unsigned" : "signed",
+ 			   operation_precision);
+ 	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ 	}
+       vect_set_operation_type (stmt_info, type, operation_precision,
+ 			       TYPE_SIGN (type));
+     }
+   vect_set_min_input_precision (stmt_info, type, min_input_precision);
+ }
+ 
+ /* Handle vect_determine_precisions for STMT_INFO, given that we
+    have already done so for the users of its result.  */
+ 
+ void
+ vect_determine_stmt_precisions (stmt_vec_info stmt_info)
+ {
+   vect_determine_min_output_precision (stmt_info);
+   if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
+     {
+       vect_determine_precisions_from_range (stmt_info, stmt);
+       vect_determine_precisions_from_users (stmt_info, stmt);
+     }
+ }
+ 
+ /* Walk backwards through the vectorizable region to determine the
+    values of these fields:
+ 
+    - min_output_precision
+    - min_input_precision
+    - operation_precision
+    - operation_sign.  */
+ 
+ void
+ vect_determine_precisions (vec_info *vinfo)
+ {
+   DUMP_VECT_SCOPE ("vect_determine_precisions");
+ 
+   if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+     {
+       struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+       basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+       unsigned int nbbs = loop->num_nodes;
+ 
+       for (unsigned int i = 0; i < nbbs; i++)
+ 	{
+ 	  basic_block bb = bbs[nbbs - i - 1];
+ 	  for (gimple_stmt_iterator si = gsi_last_bb (bb);
+ 	       !gsi_end_p (si); gsi_prev (&si))
+ 	    vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
+ 	}
+     }
+   else
+     {
+       bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+       gimple_stmt_iterator si = bb_vinfo->region_end;
+       gimple *stmt;
+       do
+ 	{
+ 	  if (!gsi_stmt (si))
+ 	    si = gsi_last_bb (bb_vinfo->bb);
+ 	  else
+ 	    gsi_prev (&si);
+ 	  stmt = gsi_stmt (si);
+ 	  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ 	  if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+ 	    vect_determine_stmt_precisions (stmt_info);
+ 	}
+       while (stmt != gsi_stmt (bb_vinfo->region_begin));
+     }
+ }
+ 
  typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
  
  struct vect_recog_func
*************** struct vect_recog_func
*** 4217,4229 ****
     taken which means usually the more complex one needs to preceed the
     less comples onex (widen_sum only after dot_prod or sad for example).  */
  static vect_recog_func vect_vect_recog_func_ptrs[] = {
    { vect_recog_widen_mult_pattern, "widen_mult" },
    { vect_recog_dot_prod_pattern, "dot_prod" },
    { vect_recog_sad_pattern, "sad" },
    { vect_recog_widen_sum_pattern, "widen_sum" },
    { vect_recog_pow_pattern, "pow" },
    { vect_recog_widen_shift_pattern, "widen_shift" },
-   { vect_recog_over_widening_pattern, "over_widening" },
    { vect_recog_rotate_pattern, "rotate" },
    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
    { vect_recog_divmod_pattern, "divmod" },
--- 4566,4579 ----
     taken which means usually the more complex one needs to preceed the
     less comples onex (widen_sum only after dot_prod or sad for example).  */
  static vect_recog_func vect_vect_recog_func_ptrs[] = {
+   { vect_recog_over_widening_pattern, "over_widening" },
+   { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
    { vect_recog_widen_mult_pattern, "widen_mult" },
    { vect_recog_dot_prod_pattern, "dot_prod" },
    { vect_recog_sad_pattern, "sad" },
    { vect_recog_widen_sum_pattern, "widen_sum" },
    { vect_recog_pow_pattern, "pow" },
    { vect_recog_widen_shift_pattern, "widen_shift" },
    { vect_recog_rotate_pattern, "rotate" },
    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
    { vect_recog_divmod_pattern, "divmod" },
*************** vect_pattern_recog (vec_info *vinfo)
*** 4497,4502 ****
--- 4847,4854 ----
    unsigned int i, j;
    auto_vec<gimple *, 1> stmts_to_replace;
  
+   vect_determine_precisions (vinfo);
+ 
    DUMP_VECT_SCOPE ("vect_pattern_recog");
  
    if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 62,69 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 62,70 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 58,64 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 58,66 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 57,63 ****
    return 0;
  }
  
! /* Final value stays in int, so no over-widening is detected at the moment.  */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 57,68 ----
    return 0;
  }
  
! /* This is an over-widening even though the final result is still an int.
!    It's better to do one vector of ops on chars and then widen than to
!    widen and then do 4 vectors of ops on ints.  */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 57,63 ****
    return 0;
  }
  
! /* Final value stays in int, so no over-widening is detected at the moment.  */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 57,68 ----
    return 0;
  }
  
! /* This is an over-widening even though the final result is still an int.
!    It's better to do one vector of ops on chars and then widen than to
!    widen and then do 4 vectors of ops on ints.  */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 57,62 ****
    return 0;
  }
  
! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 57,65 ----
    return 0;
  }
  
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 59,65 ****
    return 0;
  }
  
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 59,67 ----
    return 0;
  }
  
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 66,73 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 66,74 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 62,68 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
--- 62,70 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,66 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ /* Deliberate use of signed >>.  */
+ #define DEF_LOOP(SIGNEDNESS)			\
+   void __attribute__ ((noipa))			\
+   f_##SIGNEDNESS (SIGNEDNESS char *restrict a,	\
+ 		  SIGNEDNESS char *restrict b,	\
+ 		  SIGNEDNESS char *restrict c)	\
+   {						\
+     a[0] = (b[0] + c[0]) >> 1;			\
+     a[1] = (b[1] + c[1]) >> 1;			\
+     a[2] = (b[2] + c[2]) >> 1;			\
+     a[3] = (b[3] + c[3]) >> 1;			\
+     a[4] = (b[4] + c[4]) >> 1;			\
+     a[5] = (b[5] + c[5]) >> 1;			\
+     a[6] = (b[6] + c[6]) >> 1;			\
+     a[7] = (b[7] + c[7]) >> 1;			\
+     a[8] = (b[8] + c[8]) >> 1;			\
+     a[9] = (b[9] + c[9]) >> 1;			\
+     a[10] = (b[10] + c[10]) >> 1;		\
+     a[11] = (b[11] + c[11]) >> 1;		\
+     a[12] = (b[12] + c[12]) >> 1;		\
+     a[13] = (b[13] + c[13]) >> 1;		\
+     a[14] = (b[14] + c[14]) >> 1;		\
+     a[15] = (b[15] + c[15]) >> 1;		\
+   }
+ 
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+ 
+ #define N 16
+ 
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C)		\
+   {							\
+     SIGNEDNESS char a[N], b[N], c[N];			\
+     for (int i = 0; i < N; ++i)				\
+       {							\
+ 	b[i] = BASE_B + i * 15;				\
+ 	c[i] = BASE_C + i * 14;				\
+ 	asm volatile ("" ::: "memory");			\
+       }							\
+     f_##SIGNEDNESS (a, b, c);				\
+     for (int i = 0; i < N; ++i)				\
+       if (a[i] != (BASE_B + BASE_C + i * 29) >> 1)	\
+ 	__builtin_abort ();				\
+   }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   TEST_LOOP (signed, -128, -120);
+   TEST_LOOP (unsigned, 4, 10);
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,65 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ /* Deliberate use of signed >>.  */
+ #define DEF_LOOP(SIGNEDNESS)			\
+   void __attribute__ ((noipa))			\
+   f_##SIGNEDNESS (SIGNEDNESS char *restrict a,	\
+ 		  SIGNEDNESS char *restrict b,	\
+ 		  SIGNEDNESS char c)		\
+   {						\
+     a[0] = (b[0] + c) >> 1;			\
+     a[1] = (b[1] + c) >> 1;			\
+     a[2] = (b[2] + c) >> 1;			\
+     a[3] = (b[3] + c) >> 1;			\
+     a[4] = (b[4] + c) >> 1;			\
+     a[5] = (b[5] + c) >> 1;			\
+     a[6] = (b[6] + c) >> 1;			\
+     a[7] = (b[7] + c) >> 1;			\
+     a[8] = (b[8] + c) >> 1;			\
+     a[9] = (b[9] + c) >> 1;			\
+     a[10] = (b[10] + c) >> 1;			\
+     a[11] = (b[11] + c) >> 1;			\
+     a[12] = (b[12] + c) >> 1;			\
+     a[13] = (b[13] + c) >> 1;			\
+     a[14] = (b[14] + c) >> 1;			\
+     a[15] = (b[15] + c) >> 1;			\
+   }
+ 
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+ 
+ #define N 16
+ 
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, C)		\
+   {							\
+     SIGNEDNESS char a[N], b[N], c[N];			\
+     for (int i = 0; i < N; ++i)				\
+       {							\
+ 	b[i] = BASE_B + i * 15;				\
+ 	asm volatile ("" ::: "memory");			\
+       }							\
+     f_##SIGNEDNESS (a, b, C);				\
+     for (int i = 0; i < N; ++i)				\
+       if (a[i] != (BASE_B + C + i * 15) >> 1)		\
+ 	__builtin_abort ();				\
+   }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   TEST_LOOP (signed, -128, -120);
+   TEST_LOOP (unsigned, 4, 250);
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   /* Deliberate use of signed >>.  */
+   for (int i = 0; i < N; ++i)
+     a[i] = (b[i] + c[i]) >> 1;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ 
+ #include "vect-over-widen-5.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #define D -120
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c, SIGNEDNESS char d)
+ {
+   int promoted_d = d;
+   for (int i = 0; i < N; ++i)
+     /* Deliberate use of signed >>.  */
+     a[i] = (b[i] + c[i] + promoted_d) >> 2;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, D);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #define D 251
+ #endif
+ 
+ #include "vect-over-widen-7.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   for (int i = 0; i < N; ++i)
+     {
+       /* Deliberate use of signed >>.  */
+       int res = b[i] + c[i];
+       a[i] = (res + (res >> 1)) >> 2;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     {
+       int res = BASE_B + BASE_C + i * 9;
+       if (a[i] != ((res + (res >> 1)) >> 2))
+ 	__builtin_abort ();
+     }
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-9.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,63 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+ 
+ #define N 50
+ 
+ /* Both range analysis and backward propagation from the truncation show
+    that these calculations can be done in SIGNEDNESS short, with "res"
+    being extended for the store to d[i].  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c, int *restrict d)
+ {
+   for (int i = 0; i < N; ++i)
+     {
+       /* Deliberate use of signed >>.  */
+       int res = b[i] + c[i];
+       a[i] = (res + (res >> 1)) >> 2;
+       d[i] = res;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   int d[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d);
+   for (int i = 0; i < N; ++i)
+     {
+       int res = BASE_B + BASE_C + i * 9;
+       if (a[i] != ((res + (res >> 1)) >> 2))
+ 	__builtin_abort ();
+       if (d[i] != res)
+ 	__builtin_abort ();
+     }
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-11.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+ 
+ #define N 50
+ 
+ /* We rely on range analysis to show that these calculations can be done
+    in SIGNEDNESS short.  */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   for (int i = 0; i < N; ++i)
+     a[i] = (b[i] + c[i]) / 2;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   SIGNEDNESS char a[N], b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-13.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+ 
+ #define N 50
+ 
+ /* We rely on range analysis to show that these calculations can be done
+    in SIGNEDNESS short, with the result being extended to int for the
+    store.  */
+ void __attribute__ ((noipa))
+ f (int *restrict a, SIGNEDNESS char *restrict b,
+    SIGNEDNESS char *restrict c)
+ {
+   for (int i = 0; i < N; ++i)
+     a[i] = (b[i] + c[i]) / 2;
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   int a[N];
+   SIGNEDNESS char b[N], c[N];
+   for (int i = 0; i < N; ++i)
+     {
+       b[i] = BASE_B + i * 5;
+       c[i] = BASE_C + i * 4;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+ 
+ #include "vect-over-widen-15.c"
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,46 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 1024
+ 
+ /* This should not be treated as an over-widening pattern, even though
+    "(b[i] & 0xef) | 0x80)" could be done in unsigned chars.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned short *restrict a, unsigned short *restrict b)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+       a[i] = foo;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned short a[N], b[N];
+   for (int i = 0; i < N; ++i)
+     {
+       a[i] = i;
+       b[i] = i * 3;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 1024
+ 
+ /* This should be treated as an over-widening pattern: we can truncate
+    b to unsigned char after loading it and do all the computation in
+    unsigned char.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned char *restrict a, unsigned short *restrict b)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+       a[i] = foo;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned char a[N];
+   unsigned short b[N];
+   for (int i = 0; i < N; ++i)
+     {
+       a[i] = i;
+       b[i] = i * 3;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 111
+ 
+ /* This shouldn't be treated as an over-widening operation: it's better
+    to reuse the extensions of di and ei for di + ei than to add them
+    as shorts and introduce a third extension.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+    unsigned int *restrict c, unsigned char *restrict d,
+    unsigned char *restrict e)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       unsigned int di = d[i];
+       unsigned int ei = e[i];
+       a[i] = di;
+       b[i] = ei;
+       c[i] = di + ei;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned int a[N], b[N], c[N];
+   unsigned char d[N], e[N];
+   for (int i = 0; i < N; ++i)
+     {
+       d[i] = i * 2 + 3;
+       e[i] = i + 100;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d, e);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != i * 2 + 3
+ 	|| b[i] != i + 100
+ 	|| c[i] != i * 3 + 103)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 111
+ 
+ /* This shouldn't be treated as an over-widening operation: it's better
+    to reuse the extensions of di and ei for di + ei than to add them
+    as shorts and introduce a third extension.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+    unsigned int *restrict c, unsigned char *restrict d,
+    unsigned char *restrict e)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       int di = d[i];
+       int ei = e[i];
+       a[i] = di;
+       b[i] = ei;
+       c[i] = di + ei;
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned int a[N], b[N], c[N];
+   unsigned char d[N], e[N];
+   for (int i = 0; i < N; ++i)
+     {
+       d[i] = i * 2 + 3;
+       e[i] = i + 100;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d, e);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != i * 2 + 3
+ 	|| b[i] != i + 100
+ 	|| c[i] != i * 3 + 103)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
===================================================================
*** /dev/null	2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c	2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ #include "tree-vect.h"
+ 
+ #define N 111
+ 
+ /* This shouldn't be treated as an over-widening operation: it's better
+    to reuse the extensions of di and ei for di + ei than to add them
+    as shorts and introduce a third extension.  */
+ 
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+    unsigned int *restrict c, unsigned char *restrict d,
+    unsigned char *restrict e)
+ {
+   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+     {
+       a[i] = d[i];
+       b[i] = e[i];
+       c[i] = d[i] + e[i];
+     }
+ }
+ 
+ int
+ main (void)
+ {
+   check_vect ();
+ 
+   unsigned int a[N], b[N], c[N];
+   unsigned char d[N], e[N];
+   for (int i = 0; i < N; ++i)
+     {
+       d[i] = i * 2 + 3;
+       e[i] = i + 100;
+       asm volatile ("" ::: "memory");
+     }
+   f (a, b, c, d, e);
+   for (int i = 0; i < N; ++i)
+     if (a[i] != i * 2 + 3
+ 	|| b[i] != i + 100
+ 	|| c[i] != i * 3 + 103)
+       __builtin_abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-06-29 12:56 ` Richard Sandiford
@ 2018-07-02 11:02   ` Christophe Lyon
  2018-07-02 13:37     ` Richard Sandiford
  2018-07-02 13:12   ` Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Christophe Lyon @ 2018-07-02 11:02 UTC (permalink / raw)
  To: gcc Patches, Richard Sandiford

On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > This patch is the main part of PR85694.  The aim is to recognise at least:
> >
> >   signed char *a, *b, *c;
> >   ...
> >   for (int i = 0; i < 2048; i++)
> >     c[i] = (a[i] + b[i]) >> 1;
> >
> > as an over-widening pattern, since the addition and shift can be done
> > on shorts rather than ints.  However, it ended up being a lot more
> > general than that.
> >
> > The current over-widening pattern detection is limited to a few simple
> > cases: logical ops with immediate second operands, and shifts by a
> > constant.  These cases are enough for common pixel-format conversion
> > and can be detected in a peephole way.
> >
> > The loop above requires two generalisations of the current code: support
> > for addition as well as logical ops, and support for non-constant second
> > operands.  These are harder to detect in the same peephole way, so the
> > patch tries to take a more global approach.
> >
> > The idea is to get information about the minimum operation width
> > in two ways:
> >
> > (1) by using the range information attached to the SSA_NAMEs
> >     (effectively a forward walk, since the range info is
> >     context-independent).
> >
> > (2) by back-propagating the number of output bits required by
> >     users of the result.
> >
> > As explained in the comments, there's a balance to be struck between
> > narrowing an individual operation and fitting in with the surrounding
> > code.  The approach is pretty conservative: if we could narrow an
> > operation to N bits without changing its semantics, it's OK to do that if:
> >
> > - no operations later in the chain require more than N bits; or
> >
> > - all internally-defined inputs are extended from N bits or fewer,
> >   and at least one of them is single-use.
> >
> > See the comments for the rationale.
> >
> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> > since the code seemed more readable without.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Here's a version rebased on top of current trunk.  Changes from last time:
>
> - reintroduce dump_generic_expr_loc, with the obvious change to the
>   prototype
>
> - fix a typo in a comment
>
> - use vect_element_precision from the new version of 12/n.
>
> Tested as before.  OK to install?
>

Hi Richard,

This patch introduces regressions on arm-none-linux-gnueabihf:
    gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
    gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 8
    gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
    gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 1
    gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
    gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 1
    gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
    gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2

Christophe

> Richard
>
>
> 2018-06-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * poly-int.h (print_hex): New function.
>         * dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
>         * dumpfile.c (dump_generic_expr): Fix formatting.
>         (dump_generic_expr_loc): New function.
>         (dump_dec, dump_hex): New poly_wide_int functions.
>         * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
>         min_input_precision, operation_precision and operation_sign.
>         * tree-vect-patterns.c (vect_get_range_info): New function.
>         (vect_same_loop_or_bb_p, vect_single_imm_use)
>         (vect_operation_fits_smaller_type): Delete.
>         (vect_look_through_possible_promotion): Add an optional
>         single_use_p parameter.
>         (vect_recog_over_widening_pattern): Rewrite to use new
>         stmt_vec_info infomration.  Handle one operation at a time.
>         (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
>         (vect_truncatable_operation_p, vect_set_operation_type)
>         (vect_set_min_input_precision): New functions.
>         (vect_determine_min_output_precision_1): Likewise.
>         (vect_determine_min_output_precision): Likewise.
>         (vect_determine_precisions_from_range): Likewise.
>         (vect_determine_precisions_from_users): Likewise.
>         (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
>         (vect_vect_recog_func_ptrs): Put over_widening first.
>         Add cast_forwprop.
>         (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
>         over-widening messages.
>         * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-2.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-3.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-4.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
>         * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
>         * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-5.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-6.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-7.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-8.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-9.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-10.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-11.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-12.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-13.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-14.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-15.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-16.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-17.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-18.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-19.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-20.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-21.c: Likewise.
>
> Index: gcc/poly-int.h
> ===================================================================
> *** gcc/poly-int.h      2018-06-29 12:33:06.000000000 +0100
> --- gcc/poly-int.h      2018-06-29 12:33:06.721263572 +0100
> *************** print_dec (const poly_int_pod<N, C> &val
> *** 2420,2425 ****
> --- 2420,2444 ----
>              poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
>   }
>
> + /* Use print_hex to print VALUE to FILE.  */
> +
> + template<unsigned int N, typename C>
> + void
> + print_hex (const poly_int_pod<N, C> &value, FILE *file)
> + {
> +   if (value.is_constant ())
> +     print_hex (value.coeffs[0], file);
> +   else
> +     {
> +       fprintf (file, "[");
> +       for (unsigned int i = 0; i < N; ++i)
> +       {
> +         print_hex (value.coeffs[i], file);
> +         fputc (i == N - 1 ? ']' : ',', file);
> +       }
> +     }
> + }
> +
>   /* Helper for calculating the distance between two points P1 and P2,
>      in cases where known_le (P1, P2).  T1 and T2 are the types of the
>      two positions, in either order.  The coefficients of P2 - P1 have
> Index: gcc/dumpfile.h
> ===================================================================
> *** gcc/dumpfile.h      2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.h      2018-06-29 12:33:06.717263602 +0100
> *************** extern void dump_printf_loc (dump_flags_
> *** 425,430 ****
> --- 425,432 ----
>                              const char *, ...) ATTRIBUTE_PRINTF_3;
>   extern void dump_function (int phase, tree fn);
>   extern void dump_basic_block (dump_flags_t, basic_block, int);
> + extern void dump_generic_expr_loc (dump_flags_t, const dump_location_t &,
> +                                  dump_flags_t, tree);
>   extern void dump_generic_expr (dump_flags_t, dump_flags_t, tree);
>   extern void dump_gimple_stmt_loc (dump_flags_t, const dump_location_t &,
>                                   dump_flags_t, gimple *, int);
> *************** extern bool enable_rtl_dump_file (void);
> *** 434,439 ****
> --- 436,443 ----
>
>   template<unsigned int N, typename C>
>   void dump_dec (dump_flags_t, const poly_int<N, C> &);
> + extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
> + extern void dump_hex (dump_flags_t, const poly_wide_int &);
>
>   /* In tree-dump.c  */
>   extern void dump_node (const_tree, dump_flags_t, FILE *);
> Index: gcc/dumpfile.c
> ===================================================================
> *** gcc/dumpfile.c      2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.c      2018-06-29 12:33:06.717263602 +0100
> *************** dump_generic_expr (dump_flags_t dump_kin
> *** 498,507 ****
> --- 498,527 ----
>                    tree t)
>   {
>     if (dump_file && (dump_kind & pflags))
> +     print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> +
> +   if (alt_dump_file && (dump_kind & alt_flags))
> +     print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> + }
> +
> + /* Similar to dump_generic_expr, except additionally print source location.  */
> +
> + void
> + dump_generic_expr_loc (dump_flags_t dump_kind, const dump_location_t &loc,
> +                      dump_flags_t extra_dump_flags, tree t)
> + {
> +   location_t srcloc = loc.get_location_t ();
> +   if (dump_file && (dump_kind & pflags))
> +     {
> +       dump_loc (dump_kind, dump_file, srcloc);
>         print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> +     }
>
>     if (alt_dump_file && (dump_kind & alt_flags))
> +     {
> +       dump_loc (dump_kind, alt_dump_file, srcloc);
>         print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> +     }
>   }
>
>   /* Output a formatted message using FORMAT on appropriate dump streams.  */
> *************** template void dump_dec (dump_flags_t, co
> *** 573,578 ****
> --- 593,620 ----
>   template void dump_dec (dump_flags_t, const poly_offset_int &);
>   template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> + void
> + dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
> + {
> +   if (dump_file && (dump_kind & pflags))
> +     print_dec (value, dump_file, sgn);
> +
> +   if (alt_dump_file && (dump_kind & alt_flags))
> +     print_dec (value, alt_dump_file, sgn);
> + }
> +
> + /* Output VALUE in hexadecimal to appropriate dump streams.  */
> +
> + void
> + dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
> + {
> +   if (dump_file && (dump_kind & pflags))
> +     print_hex (value, dump_file);
> +
> +   if (alt_dump_file && (dump_kind & alt_flags))
> +     print_hex (value, alt_dump_file);
> + }
> +
>   /* Start a dump for PHASE. Store user-supplied dump flags in
>      *FLAG_PTR.  Return the number of streams opened.  Set globals
>      DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h       2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vectorizer.h       2018-06-29 12:33:06.725263540 +0100
> *************** typedef struct _stmt_vec_info {
> *** 899,904 ****
> --- 899,919 ----
>
>     /* The number of scalar stmt references from active SLP instances.  */
>     unsigned int num_slp_uses;
> +
> +   /* If nonzero, the lhs of the statement could be truncated to this
> +      many bits without affecting any users of the result.  */
> +   unsigned int min_output_precision;
> +
> +   /* If nonzero, all non-boolean input operands have the same precision,
> +      and they could each be truncated to this many bits without changing
> +      the result.  */
> +   unsigned int min_input_precision;
> +
> +   /* If OPERATION_BITS is nonzero, the statement could be performed on
> +      an integer with the sign and number of bits given by OPERATION_SIGN
> +      and OPERATION_BITS without changing the result.  */
> +   unsigned int operation_precision;
> +   signop operation_sign;
>   } *stmt_vec_info;
>
>   /* Information about a gather/scatter call.  */
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> *** gcc/tree-vect-patterns.c    2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vect-patterns.c    2018-06-29 12:33:06.721263572 +0100
> *************** Software Foundation; either version 3, o
> *** 47,52 ****
> --- 47,86 ----
>   #include "omp-simd-clone.h"
>   #include "predict.h"
>
> + /* Return true if we have a useful VR_RANGE range for VAR, storing it
> +    in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
> +
> + static bool
> + vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> + {
> +   value_range_type vr_type = get_range_info (var, min_value, max_value);
> +   wide_int nonzero = get_nonzero_bits (var);
> +   signop sgn = TYPE_SIGN (TREE_TYPE (var));
> +   if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
> +                                        nonzero, sgn) == VR_RANGE)
> +     {
> +       if (dump_enabled_p ())
> +       {
> +         dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> +         dump_printf (MSG_NOTE, " has range [");
> +         dump_hex (MSG_NOTE, *min_value);
> +         dump_printf (MSG_NOTE, ", ");
> +         dump_hex (MSG_NOTE, *max_value);
> +         dump_printf (MSG_NOTE, "]\n");
> +       }
> +       return true;
> +     }
> +   else
> +     {
> +       if (dump_enabled_p ())
> +       {
> +         dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> +         dump_printf (MSG_NOTE, " has no range info\n");
> +       }
> +       return false;
> +     }
> + }
> +
>   /* Report that we've found an instance of pattern PATTERN in
>      statement STMT.  */
>
> *************** vect_supportable_direct_optab_p (tree ot
> *** 190,229 ****
>     return true;
>   }
>
> - /* Check whether STMT2 is in the same loop or basic block as STMT1.
> -    Which of the two applies depends on whether we're currently doing
> -    loop-based or basic-block-based vectorization, as determined by
> -    the vinfo_for_stmt for STMT1 (which must be defined).
> -
> -    If this returns true, vinfo_for_stmt for STMT2 is guaranteed
> -    to be defined as well.  */
> -
> - static bool
> - vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> - {
> -   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> -   return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> - }
> -
> - /* If the LHS of DEF_STMT has a single use, and that statement is
> -    in the same loop or basic block, return it.  */
> -
> - static gimple *
> - vect_single_imm_use (gimple *def_stmt)
> - {
> -   tree lhs = gimple_assign_lhs (def_stmt);
> -   use_operand_p use_p;
> -   gimple *use_stmt;
> -
> -   if (!single_imm_use (lhs, &use_p, &use_stmt))
> -     return NULL;
> -
> -   if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
> -     return NULL;
> -
> -   return use_stmt;
> - }
> -
>   /* Round bit precision PRECISION up to a full element.  */
>
>   static unsigned int
> --- 224,229 ----
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 347,353 ****
>      is possible to convert OP' back to OP using a possible sign change
>      followed by a possible promotion P.  Return this OP', or null if OP is
>      not a vectorizable SSA name.  If there is a promotion P, describe its
> !    input in UNPROM, otherwise describe OP' in UNPROM.
>
>      A successful return means that it is possible to go from OP' to OP
>      via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
> --- 347,355 ----
>      is possible to convert OP' back to OP using a possible sign change
>      followed by a possible promotion P.  Return this OP', or null if OP is
>      not a vectorizable SSA name.  If there is a promotion P, describe its
> !    input in UNPROM, otherwise describe OP' in UNPROM.  If SINGLE_USE_P
> !    is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
> !    have more than one user.
>
>      A successful return means that it is possible to go from OP' to OP
>      via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 374,380 ****
>
>   static tree
>   vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> !                                     vect_unpromoted_value *unprom)
>   {
>     tree res = NULL_TREE;
>     tree op_type = TREE_TYPE (op);
> --- 376,383 ----
>
>   static tree
>   vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> !                                     vect_unpromoted_value *unprom,
> !                                     bool *single_use_p = NULL)
>   {
>     tree res = NULL_TREE;
>     tree op_type = TREE_TYPE (op);
> *************** vect_look_through_possible_promotion (ve
> *** 420,426 ****
>         if (!def_stmt)
>         break;
>         if (dt == vect_internal_def)
> !       caster = vinfo_for_stmt (def_stmt);
>         else
>         caster = NULL;
>         gassign *assign = dyn_cast <gassign *> (def_stmt);
> --- 423,436 ----
>         if (!def_stmt)
>         break;
>         if (dt == vect_internal_def)
> !       {
> !         caster = vinfo_for_stmt (def_stmt);
> !         /* Ignore pattern statements, since we don't link uses for them.  */
> !         if (single_use_p
> !             && !STMT_VINFO_RELATED_STMT (caster)
> !             && !has_single_use (res))
> !           *single_use_p = false;
> !       }
>         else
>         caster = NULL;
>         gassign *assign = dyn_cast <gassign *> (def_stmt);
> *************** vect_recog_widen_sum_pattern (vec<gimple
> *** 1371,1733 ****
>     return pattern_stmt;
>   }
>
>
> ! /* Return TRUE if the operation in STMT can be performed on a smaller type.
>
> !    Input:
> !    STMT - a statement to check.
> !    DEF - we support operations with two operands, one of which is constant.
> !          The other operand can be defined by a demotion operation, or by a
> !          previous statement in a sequence of over-promoted operations.  In the
> !          later case DEF is used to replace that operand.  (It is defined by a
> !          pattern statement we created for the previous statement in the
> !          sequence).
> !
> !    Input/output:
> !    NEW_TYPE - Output: a smaller type that we are trying to use.  Input: if not
> !          NULL, it's the type of DEF.
> !    STMTS - additional pattern statements.  If a pattern statement (type
> !          conversion) is created in this function, its original statement is
> !          added to STMTS.
>
> !    Output:
> !    OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
> !          operands to use in the new pattern statement for STMT (will be created
> !          in vect_recog_over_widening_pattern ()).
> !    NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
> !          statements for STMT: the first one is a type promotion and the second
> !          one is the operation itself.  We return the type promotion statement
> !        in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
> !          the second pattern statement.  */
>
> ! static bool
> ! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
> !                                 tree *op0, tree *op1, gimple **new_def_stmt,
> !                                 vec<gimple *> *stmts)
> ! {
> !   enum tree_code code;
> !   tree const_oprnd, oprnd;
> !   tree interm_type = NULL_TREE, half_type, new_oprnd, type;
> !   gimple *def_stmt, *new_stmt;
> !   bool first = false;
> !   bool promotion;
>
> !   *op0 = NULL_TREE;
> !   *op1 = NULL_TREE;
> !   *new_def_stmt = NULL;
>
> !   if (!is_gimple_assign (stmt))
> !     return false;
>
> !   code = gimple_assign_rhs_code (stmt);
> !   if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
> !       && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
> !     return false;
>
> !   oprnd = gimple_assign_rhs1 (stmt);
> !   const_oprnd = gimple_assign_rhs2 (stmt);
> !   type = gimple_expr_type (stmt);
>
> !   if (TREE_CODE (oprnd) != SSA_NAME
> !       || TREE_CODE (const_oprnd) != INTEGER_CST)
> !     return false;
>
> !   /* If oprnd has other uses besides that in stmt we cannot mark it
> !      as being part of a pattern only.  */
> !   if (!has_single_use (oprnd))
> !     return false;
>
> !   /* If we are in the middle of a sequence, we use DEF from a previous
> !      statement.  Otherwise, OPRND has to be a result of type promotion.  */
> !   if (*new_type)
> !     {
> !       half_type = *new_type;
> !       oprnd = def;
> !     }
> !   else
>       {
> !       first = true;
> !       if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
> !                             &promotion)
> !         || !promotion
> !         || !vect_same_loop_or_bb_p (stmt, def_stmt))
> !         return false;
>       }
>
> !   /* Can we perform the operation on a smaller type?  */
> !   switch (code)
> !     {
> !       case BIT_IOR_EXPR:
> !       case BIT_XOR_EXPR:
> !       case BIT_AND_EXPR:
> !         if (!int_fits_type_p (const_oprnd, half_type))
> !           {
> !             /* HALF_TYPE is not enough.  Try a bigger type if possible.  */
> !             if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> !               return false;
> !
> !             interm_type = build_nonstandard_integer_type (
> !                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !             if (!int_fits_type_p (const_oprnd, interm_type))
> !               return false;
> !           }
> !
> !         break;
> !
> !       case LSHIFT_EXPR:
> !         /* Try intermediate type - HALF_TYPE is not enough for sure.  */
> !         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> !           return false;
> !
> !         /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
> !           (e.g., if the original value was char, the shift amount is at most 8
> !            if we want to use short).  */
> !         if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
> !           return false;
> !
> !         interm_type = build_nonstandard_integer_type (
> !                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> !         if (!vect_supportable_shift (code, interm_type))
> !           return false;
> !
> !         break;
> !
> !       case RSHIFT_EXPR:
> !         if (vect_supportable_shift (code, half_type))
> !           break;
> !
> !         /* Try intermediate type - HALF_TYPE is not supported.  */
> !         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> !           return false;
> !
> !         interm_type = build_nonstandard_integer_type (
> !                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> !         if (!vect_supportable_shift (code, interm_type))
> !           return false;
> !
> !         break;
> !
> !       default:
> !         gcc_unreachable ();
> !     }
> !
> !   /* There are four possible cases:
> !      1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
> !         the first statement in the sequence)
> !         a. The original, HALF_TYPE, is not enough - we replace the promotion
> !            from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
> !         b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
> !            promotion.
> !      2. OPRND is defined by a pattern statement we created.
> !         a. Its type is not sufficient for the operation, we create a new stmt:
> !            a type conversion for OPRND from HALF_TYPE to INTERM_TYPE.  We store
> !            this statement in NEW_DEF_STMT, and it is later put in
> !          STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
> !         b. OPRND is good to use in the new statement.  */
> !   if (first)
> !     {
> !       if (interm_type)
> !         {
> !           /* Replace the original type conversion HALF_TYPE->TYPE with
> !              HALF_TYPE->INTERM_TYPE.  */
> !           if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
> !             {
> !               new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
> !               /* Check if the already created pattern stmt is what we need.  */
> !               if (!is_gimple_assign (new_stmt)
> !                   || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
> !                   || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
> !                 return false;
> !
> !             stmts->safe_push (def_stmt);
> !               oprnd = gimple_assign_lhs (new_stmt);
> !             }
> !           else
> !             {
> !               /* Create NEW_OPRND = (INTERM_TYPE) OPRND.  */
> !               oprnd = gimple_assign_rhs1 (def_stmt);
> !             new_oprnd = make_ssa_name (interm_type);
> !             new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> !               STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
> !               stmts->safe_push (def_stmt);
> !               oprnd = new_oprnd;
> !             }
> !         }
> !       else
> !         {
> !           /* Retrieve the operand before the type promotion.  */
> !           oprnd = gimple_assign_rhs1 (def_stmt);
> !         }
> !     }
> !   else
> !     {
> !       if (interm_type)
> !         {
> !           /* Create a type conversion HALF_TYPE->INTERM_TYPE.  */
> !         new_oprnd = make_ssa_name (interm_type);
> !         new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> !           oprnd = new_oprnd;
> !           *new_def_stmt = new_stmt;
> !         }
>
> !       /* Otherwise, OPRND is already set.  */
>       }
>
> !   if (interm_type)
> !     *new_type = interm_type;
> !   else
> !     *new_type = half_type;
>
> !   *op0 = oprnd;
> !   *op1 = fold_convert (*new_type, const_oprnd);
> !
> !   return true;
>   }
>
>
> ! /* Try to find a statement or a sequence of statements that can be performed
> !    on a smaller type:
>
> !      type x_t;
> !      TYPE x_T, res0_T, res1_T;
> !    loop:
> !      S1  x_t = *p;
> !      S2  x_T = (TYPE) x_t;
> !      S3  res0_T = op (x_T, C0);
> !      S4  res1_T = op (res0_T, C1);
> !      S5  ... = () res1_T;  - type demotion
> !
> !    where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
> !    constants.
> !    Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
> !    be 'type' or some intermediate type.  For now, we expect S5 to be a type
> !    demotion operation.  We also check that S3 and S4 have only one use.  */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> !   gimple *stmt = stmts->pop ();
> !   gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
> !        *use_stmt = NULL;
> !   tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
> !   tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
> !   bool first;
> !   tree type = NULL;
> !
> !   first = true;
> !   while (1)
> !     {
> !       if (!vinfo_for_stmt (stmt)
> !           || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
> !         return NULL;
> !
> !       new_def_stmt = NULL;
> !       if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
> !                                              &op0, &op1, &new_def_stmt,
> !                                              stmts))
> !         {
> !           if (first)
> !             return NULL;
> !           else
> !             break;
> !         }
>
> !       /* STMT can be performed on a smaller type.  Check its uses.  */
> !       use_stmt = vect_single_imm_use (stmt);
> !       if (!use_stmt || !is_gimple_assign (use_stmt))
> !         return NULL;
> !
> !       /* Create pattern statement for STMT.  */
> !       vectype = get_vectype_for_scalar_type (new_type);
> !       if (!vectype)
> !         return NULL;
> !
> !       /* We want to collect all the statements for which we create pattern
> !          statetments, except for the case when the last statement in the
> !          sequence doesn't have a corresponding pattern statement.  In such
> !          case we associate the last pattern statement with the last statement
> !          in the sequence.  Therefore, we only add the original statement to
> !          the list if we know that it is not the last.  */
> !       if (prev_stmt)
> !         stmts->safe_push (prev_stmt);
>
> !       var = vect_recog_temp_ssa_var (new_type, NULL);
> !       pattern_stmt
> !       = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
> !       STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
> !       new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
>
> !       if (dump_enabled_p ())
> !         {
> !           dump_printf_loc (MSG_NOTE, vect_location,
> !                            "created pattern stmt: ");
> !           dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> !         }
>
> !       type = gimple_expr_type (stmt);
> !       prev_stmt = stmt;
> !       stmt = use_stmt;
> !
> !       first = false;
> !     }
> !
> !   /* We got a sequence.  We expect it to end with a type demotion operation.
> !      Otherwise, we quit (for now).  There are three possible cases: the
> !      conversion is to NEW_TYPE (we don't do anything), the conversion is to
> !      a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
> !      NEW_TYPE differs (we create a new conversion statement).  */
> !   if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
> !     {
> !       use_lhs = gimple_assign_lhs (use_stmt);
> !       use_type = TREE_TYPE (use_lhs);
> !       /* Support only type demotion or signedess change.  */
> !       if (!INTEGRAL_TYPE_P (use_type)
> !         || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
> !         return NULL;
>
> !       /* Check that NEW_TYPE is not bigger than the conversion result.  */
> !       if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
> !       return NULL;
>
> !       if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
> !           || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
> !         {
> !         *type_out = get_vectype_for_scalar_type (use_type);
> !         if (!*type_out)
> !           return NULL;
>
> !           /* Create NEW_TYPE->USE_TYPE conversion.  */
> !         new_oprnd = make_ssa_name (use_type);
> !         pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
> !           STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
> !
> !           /* We created a pattern statement for the last statement in the
> !              sequence, so we don't need to associate it with the pattern
> !              statement created for PREV_STMT.  Therefore, we add PREV_STMT
> !              to the list in order to mark it later in vect_pattern_recog_1.  */
> !           if (prev_stmt)
> !             stmts->safe_push (prev_stmt);
> !         }
> !       else
> !         {
> !           if (prev_stmt)
> !           STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
> !              = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
>
> !         *type_out = vectype;
> !         }
>
> !       stmts->safe_push (use_stmt);
> !     }
> !   else
> !     /* TODO: support general case, create a conversion to the correct type.  */
>       return NULL;
>
> !   /* Pattern detected.  */
> !   vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
>
>     return pattern_stmt;
>   }
>
> --- 1381,1698 ----
>     return pattern_stmt;
>   }
>
> + /* Recognize cases in which an operation is performed in one type WTYPE
> +    but could be done more efficiently in a narrower type NTYPE.  For example,
> +    if we have:
> +
> +      ATYPE a;  // narrower than NTYPE
> +      BTYPE b;  // narrower than NTYPE
> +      WTYPE aw = (WTYPE) a;
> +      WTYPE bw = (WTYPE) b;
> +      WTYPE res = aw + bw;  // only uses of aw and bw
> +
> +    then it would be more efficient to do:
> +
> +      NTYPE an = (NTYPE) a;
> +      NTYPE bn = (NTYPE) b;
> +      NTYPE resn = an + bn;
> +      WTYPE res = (WTYPE) resn;
> +
> +    Other situations include things like:
> +
> +      ATYPE a;  // NTYPE or narrower
> +      WTYPE aw = (WTYPE) a;
> +      WTYPE res = aw + b;
> +
> +    when only "(NTYPE) res" is significant.  In that case it's more efficient
> +    to truncate "b" and do the operation on NTYPE instead:
> +
> +      NTYPE an = (NTYPE) a;
> +      NTYPE bn = (NTYPE) b;  // truncation
> +      NTYPE resn = an + bn;
> +      WTYPE res = (WTYPE) resn;
> +
> +    All users of "res" should then use "resn" instead, making the final
> +    statement dead (not marked as relevant).  The final statement is still
> +    needed to maintain the type correctness of the IR.
> +
> +    vect_determine_precisions has already determined the minimum
> +    precison of the operation and the minimum precision required
> +    by users of the result.  */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> !   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> !   if (!last_stmt)
> !     return NULL;
>
> !   /* See whether we have found that this operation can be done on a
> !      narrower type without changing its semantics.  */
> !   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> !   unsigned int new_precision = last_stmt_info->operation_precision;
> !   if (!new_precision)
> !     return NULL;
>
> !   vec_info *vinfo = last_stmt_info->vinfo;
> !   tree lhs = gimple_assign_lhs (last_stmt);
> !   tree type = TREE_TYPE (lhs);
> !   tree_code code = gimple_assign_rhs_code (last_stmt);
> !
> !   /* Keep the first operand of a COND_EXPR as-is: only the other two
> !      operands are interesting.  */
> !   unsigned int first_op = (code == COND_EXPR ? 2 : 1);
>
> !   /* Check the operands.  */
> !   unsigned int nops = gimple_num_ops (last_stmt) - first_op;
> !   auto_vec <vect_unpromoted_value, 3> unprom (nops);
> !   unprom.quick_grow (nops);
> !   unsigned int min_precision = 0;
> !   bool single_use_p = false;
> !   for (unsigned int i = 0; i < nops; ++i)
> !     {
> !       tree op = gimple_op (last_stmt, first_op + i);
> !       if (TREE_CODE (op) == INTEGER_CST)
> !       unprom[i].set_op (op, vect_constant_def);
> !       else if (TREE_CODE (op) == SSA_NAME)
> !       {
> !         bool op_single_use_p = true;
> !         if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
> !                                                    &op_single_use_p))
> !           return NULL;
> !         /* If:
>
> !            (1) N bits of the result are needed;
> !            (2) all inputs are widened from M<N bits; and
> !            (3) one operand OP is a single-use SSA name
> !
> !            we can shift the M->N widening from OP to the output
> !            without changing the number or type of extensions involved.
> !            This then reduces the number of copies of STMT_INFO.
> !
> !            If instead of (3) more than one operand is a single-use SSA name,
> !            shifting the extension to the output is even more of a win.
> !
> !            If instead:
> !
> !            (1) N bits of the result are needed;
> !            (2) one operand OP2 is widened from M2<N bits;
> !            (3) another operand OP1 is widened from M1<M2 bits; and
> !            (4) both OP1 and OP2 are single-use
> !
> !            the choice is between:
> !
> !            (a) truncating OP2 to M1, doing the operation on M1,
> !                and then widening the result to N
> !
> !            (b) widening OP1 to M2, doing the operation on M2, and then
> !                widening the result to N
> !
> !            Both shift the M2->N widening of the inputs to the output.
> !            (a) additionally shifts the M1->M2 widening to the output;
> !            it requires fewer copies of STMT_INFO but requires an extra
> !            M2->M1 truncation.
> !
> !            Which is better will depend on the complexity and cost of
> !            STMT_INFO, which is hard to predict at this stage.  However,
> !            a clear tie-breaker in favor of (b) is the fact that the
> !            truncation in (a) increases the length of the operation chain.
> !
> !            If instead of (4) only one of OP1 or OP2 is single-use,
> !            (b) is still a win over doing the operation in N bits:
> !            it still shifts the M2->N widening on the single-use operand
> !            to the output and reduces the number of STMT_INFO copies.
> !
> !            If neither operand is single-use then operating on fewer than
> !            N bits might lead to more extensions overall.  Whether it does
> !            or not depends on global information about the vectorization
> !            region, and whether that's a good trade-off would again
> !            depend on the complexity and cost of the statements involved,
> !            as well as things like register pressure that are not normally
> !            modelled at this stage.  We therefore ignore these cases
> !            and just optimize the clear single-use wins above.
> !
> !            Thus we take the maximum precision of the unpromoted operands
> !            and record whether any operand is single-use.  */
> !         if (unprom[i].dt == vect_internal_def)
> !           {
> !             min_precision = MAX (min_precision,
> !                                  TYPE_PRECISION (unprom[i].type));
> !             single_use_p |= op_single_use_p;
> !           }
> !       }
> !     }
>
> !   /* Although the operation could be done in operation_precision, we have
> !      to balance that against introducing extra truncations or extensions.
> !      Calculate the minimum precision that can be handled efficiently.
> !
> !      The loop above determined that the operation could be handled
> !      efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
> !      extension from the inputs to the output without introducing more
> !      instructions, and would reduce the number of instructions required
> !      for STMT_INFO itself.
> !
> !      vect_determine_precisions has also determined that the result only
> !      needs min_output_precision bits.  Truncating by a factor of N times
> !      requires a tree of N - 1 instructions, so if TYPE is N times wider
> !      than min_output_precision, doing the operation in TYPE and truncating
> !      the result requires N + (N - 1) = 2N - 1 instructions per output vector.
> !      In contrast:
> !
> !      - truncating the input to a unary operation and doing the operation
> !        in the new type requires at most N - 1 + 1 = N instructions per
> !        output vector
> !
> !      - doing the same for a binary operation requires at most
> !        (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
> !
> !      Both unary and binary operations require fewer instructions than
> !      this if the operands were extended from a suitable truncated form.
> !      Thus there is usually nothing to lose by doing operations in
> !      min_output_precision bits, but there can be something to gain.  */
> !   if (!single_use_p)
> !     min_precision = last_stmt_info->min_output_precision;
> !   else
> !     min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
>
> !   /* Apply the minimum efficient precision we just calculated.  */
> !   if (new_precision < min_precision)
> !     new_precision = min_precision;
> !   if (new_precision >= TYPE_PRECISION (type))
> !     return NULL;
>
> !   vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
>
> !   *type_out = get_vectype_for_scalar_type (type);
> !   if (!*type_out)
> !     return NULL;
>
> !   /* We've found a viable pattern.  Get the new type of the operation.  */
> !   bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
> !   tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
> !
> !   /* We specifically don't check here whether the target supports the
> !      new operation, since it might be something that a later pattern
> !      wants to rewrite anyway.  If targets have a minimum element size
> !      for some optabs, we should pattern-match smaller ops to larger ops
> !      where beneficial.  */
> !   tree new_vectype = get_vectype_for_scalar_type (new_type);
> !   if (!new_vectype)
> !     return NULL;
>
> !   if (dump_enabled_p ())
>       {
> !       dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
> !       dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
> !       dump_printf (MSG_NOTE, " to ");
> !       dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
> !       dump_printf (MSG_NOTE, "\n");
>       }
>
> !   /* Calculate the rhs operands for an operation on NEW_TYPE.  */
> !   STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
> !   tree ops[3] = {};
> !   for (unsigned int i = 1; i < first_op; ++i)
> !     ops[i - 1] = gimple_op (last_stmt, i);
> !   vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
> !                      new_type, &unprom[0], new_vectype);
> !
> !   /* Use the operation to produce a result of type NEW_TYPE.  */
> !   tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> !   gimple *pattern_stmt = gimple_build_assign (new_var, code,
> !                                             ops[0], ops[1], ops[2]);
> !   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> !   if (dump_enabled_p ())
> !     {
> !       dump_printf_loc (MSG_NOTE, vect_location,
> !                      "created pattern stmt: ");
> !       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
>       }
>
> !   pattern_stmt = vect_convert_output (last_stmt_info, type,
> !                                     pattern_stmt, new_vectype);
>
> !   stmts->safe_push (last_stmt);
> !   return pattern_stmt;
>   }
>
> + /* Recognize cases in which the input to a cast is wider than its
> +    output, and the input is fed by a widening operation.  Fold this
> +    by removing the unnecessary intermediate widening.  E.g.:
>
> !      unsigned char a;
> !      unsigned int b = (unsigned int) a;
> !      unsigned short c = (unsigned short) b;
>
> !    -->
>
> !      unsigned short c = (unsigned short) a;
>
> !    Although this is rare in input IR, it is an expected side-effect
> !    of the over-widening pattern above.
>
> !    This is beneficial also for integer-to-float conversions, if the
> !    widened integer has more bits than the float, and if the unwidened
> !    input doesn't.  */
>
> ! static gimple *
> ! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> !   /* Check for a cast, including an integer-to-float conversion.  */
> !   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> !   if (!last_stmt)
> !     return NULL;
> !   tree_code code = gimple_assign_rhs_code (last_stmt);
> !   if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
> !     return NULL;
>
> !   /* Make sure that the rhs is a scalar with a natural bitsize.  */
> !   tree lhs = gimple_assign_lhs (last_stmt);
> !   if (!lhs)
> !     return NULL;
> !   tree lhs_type = TREE_TYPE (lhs);
> !   scalar_mode lhs_mode;
> !   if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
> !       || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
> !     return NULL;
>
> !   /* Check for a narrowing operation (from a vector point of view).  */
> !   tree rhs = gimple_assign_rhs1 (last_stmt);
> !   tree rhs_type = TREE_TYPE (rhs);
> !   if (!INTEGRAL_TYPE_P (rhs_type)
> !       || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
> !       || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
> !     return NULL;
>
> !   /* Try to find an unpromoted input.  */
> !   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> !   vec_info *vinfo = last_stmt_info->vinfo;
> !   vect_unpromoted_value unprom;
> !   if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
> !       || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
> !     return NULL;
>
> !   /* If the bits above RHS_TYPE matter, make sure that they're the
> !      same when extending from UNPROM as they are when extending from RHS.  */
> !   if (!INTEGRAL_TYPE_P (lhs_type)
> !       && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
> !     return NULL;
>
> !   /* We can get the same result by casting UNPROM directly, to avoid
> !      the unnecessary widening and narrowing.  */
> !   vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
>
> !   *type_out = get_vectype_for_scalar_type (lhs_type);
> !   if (!*type_out)
>       return NULL;
>
> !   tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> !   gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
> !   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> +   stmts->safe_push (last_stmt);
>     return pattern_stmt;
>   }
>
> *************** vect_recog_gather_scatter_pattern (vec<g
> *** 4205,4210 ****
> --- 4170,4559 ----
>     return pattern_stmt;
>   }
>
> + /* Return true if TYPE is a non-boolean integer type.  These are the types
> +    that we want to consider for narrowing.  */
> +
> + static bool
> + vect_narrowable_type_p (tree type)
> + {
> +   return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
> + }
> +
> + /* Return true if the operation given by CODE can be truncated to N bits
> +    when only N bits of the output are needed.  This is only true if bit N+1
> +    of the inputs has no effect on the low N bits of the result.  */
> +
> + static bool
> + vect_truncatable_operation_p (tree_code code)
> + {
> +   switch (code)
> +     {
> +     case PLUS_EXPR:
> +     case MINUS_EXPR:
> +     case MULT_EXPR:
> +     case BIT_AND_EXPR:
> +     case BIT_IOR_EXPR:
> +     case BIT_XOR_EXPR:
> +     case COND_EXPR:
> +       return true;
> +
> +     default:
> +       return false;
> +     }
> + }
> +
> + /* Record that STMT_INFO could be changed from operating on TYPE to
> +    operating on a type with the precision and sign given by PRECISION
> +    and SIGN respectively.  PRECISION is an arbitrary bit precision;
> +    it might not be a whole number of bytes.  */
> +
> + static void
> + vect_set_operation_type (stmt_vec_info stmt_info, tree type,
> +                        unsigned int precision, signop sign)
> + {
> +   /* Round the precision up to a whole number of bytes.  */
> +   precision = vect_element_precision (precision);
> +   if (precision < TYPE_PRECISION (type)
> +       && (!stmt_info->operation_precision
> +         || stmt_info->operation_precision > precision))
> +     {
> +       stmt_info->operation_precision = precision;
> +       stmt_info->operation_sign = sign;
> +     }
> + }
> +
> + /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
> +    non-boolean inputs, all of which have type TYPE.  MIN_INPUT_PRECISION
> +    is an arbitrary bit precision; it might not be a whole number of bytes.  */
> +
> + static void
> + vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
> +                             unsigned int min_input_precision)
> + {
> +   /* This operation in isolation only requires the inputs to have
> +      MIN_INPUT_PRECISION of precision,  However, that doesn't mean
> +      that MIN_INPUT_PRECISION is a natural precision for the chain
> +      as a whole.  E.g. consider something like:
> +
> +        unsigned short *x, *y;
> +        *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> +      The right shift can be done on unsigned chars, and only requires the
> +      result of "*x & 0xf0" to be done on unsigned chars.  But taking that
> +      approach would mean turning a natural chain of single-vector unsigned
> +      short operations into one that truncates "*x" and then extends
> +      "(*x & 0xf0) >> 4", with two vectors for each unsigned short
> +      operation and one vector for each unsigned char operation.
> +      This would be a significant pessimization.
> +
> +      Instead only propagate the maximum of this precision and the precision
> +      required by the users of the result.  This means that we don't pessimize
> +      the case above but continue to optimize things like:
> +
> +        unsigned char *y;
> +        unsigned short *x;
> +        *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> +      Here we would truncate two vectors of *x to a single vector of
> +      unsigned chars and use single-vector unsigned char operations for
> +      everything else, rather than doing two unsigned short copies of
> +      "(*x & 0xf0) >> 4" and then truncating the result.  */
> +   min_input_precision = MAX (min_input_precision,
> +                            stmt_info->min_output_precision);
> +
> +   if (min_input_precision < TYPE_PRECISION (type)
> +       && (!stmt_info->min_input_precision
> +         || stmt_info->min_input_precision > min_input_precision))
> +     stmt_info->min_input_precision = min_input_precision;
> + }
> +
> + /* Subroutine of vect_determine_min_output_precision.  Return true if
> +    we can calculate a reduced number of output bits for STMT_INFO,
> +    whose result is LHS.  */
> +
> + static bool
> + vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
> + {
> +   /* Take the maximum precision required by users of the result.  */
> +   unsigned int precision = 0;
> +   imm_use_iterator iter;
> +   use_operand_p use;
> +   FOR_EACH_IMM_USE_FAST (use, iter, lhs)
> +     {
> +       gimple *use_stmt = USE_STMT (use);
> +       if (is_gimple_debug (use_stmt))
> +       continue;
> +       if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
> +       return false;
> +       stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
> +       if (!use_stmt_info->min_input_precision)
> +       return false;
> +       precision = MAX (precision, use_stmt_info->min_input_precision);
> +     }
> +
> +   if (dump_enabled_p ())
> +     {
> +       dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
> +                      precision);
> +       dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
> +       dump_printf (MSG_NOTE, " are significant\n");
> +     }
> +   stmt_info->min_output_precision = precision;
> +   return true;
> + }
> +
> + /* Calculate min_output_precision for STMT_INFO.  */
> +
> + static void
> + vect_determine_min_output_precision (stmt_vec_info stmt_info)
> + {
> +   /* We're only interested in statements with a narrowable result.  */
> +   tree lhs = gimple_get_lhs (stmt_info->stmt);
> +   if (!lhs
> +       || TREE_CODE (lhs) != SSA_NAME
> +       || !vect_narrowable_type_p (TREE_TYPE (lhs)))
> +     return;
> +
> +   if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
> +     stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
> + }
> +
> + /* Use range information to decide whether STMT (described by STMT_INFO)
> +    could be done in a narrower type.  This is effectively a forward
> +    propagation, since it uses context-independent information that applies
> +    to all users of an SSA name.  */
> +
> + static void
> + vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
> + {
> +   tree lhs = gimple_assign_lhs (stmt);
> +   if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> +     return;
> +
> +   tree type = TREE_TYPE (lhs);
> +   if (!vect_narrowable_type_p (type))
> +     return;
> +
> +   /* First see whether we have any useful range information for the result.  */
> +   unsigned int precision = TYPE_PRECISION (type);
> +   signop sign = TYPE_SIGN (type);
> +   wide_int min_value, max_value;
> +   if (!vect_get_range_info (lhs, &min_value, &max_value))
> +     return;
> +
> +   tree_code code = gimple_assign_rhs_code (stmt);
> +   unsigned int nops = gimple_num_ops (stmt);
> +
> +   if (!vect_truncatable_operation_p (code))
> +     /* Check that all relevant input operands are compatible, and update
> +        [MIN_VALUE, MAX_VALUE] to include their ranges.  */
> +     for (unsigned int i = 1; i < nops; ++i)
> +       {
> +       tree op = gimple_op (stmt, i);
> +       if (TREE_CODE (op) == INTEGER_CST)
> +         {
> +           /* Don't require the integer to have RHS_TYPE (which it might
> +              not for things like shift amounts, etc.), but do require it
> +              to fit the type.  */
> +           if (!int_fits_type_p (op, type))
> +             return;
> +
> +           min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
> +           max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
> +         }
> +       else if (TREE_CODE (op) == SSA_NAME)
> +         {
> +           /* Ignore codes that don't take uniform arguments.  */
> +           if (!types_compatible_p (TREE_TYPE (op), type))
> +             return;
> +
> +           wide_int op_min_value, op_max_value;
> +           if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> +             return;
> +
> +           min_value = wi::min (min_value, op_min_value, sign);
> +           max_value = wi::max (max_value, op_max_value, sign);
> +         }
> +       else
> +         return;
> +       }
> +
> +   /* Try to switch signed types for unsigned types if we can.
> +      This is better for two reasons.  First, unsigned ops tend
> +      to be cheaper than signed ops.  Second, it means that we can
> +      handle things like:
> +
> +       signed char c;
> +       int res = (int) c & 0xff00; // range [0x0000, 0xff00]
> +
> +      as:
> +
> +       signed char c;
> +       unsigned short res_1 = (unsigned short) c & 0xff00;
> +       int res = (int) res_1;
> +
> +      where the intermediate result res_1 has unsigned rather than
> +      signed type.  */
> +   if (sign == SIGNED && !wi::neg_p (min_value))
> +     sign = UNSIGNED;
> +
> +   /* See what precision is required for MIN_VALUE and MAX_VALUE.  */
> +   unsigned int precision1 = wi::min_precision (min_value, sign);
> +   unsigned int precision2 = wi::min_precision (max_value, sign);
> +   unsigned int value_precision = MAX (precision1, precision2);
> +   if (value_precision >= precision)
> +     return;
> +
> +   if (dump_enabled_p ())
> +     {
> +       dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> +                      " without loss of precision: ",
> +                      sign == SIGNED ? "signed" : "unsigned",
> +                      value_precision);
> +       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> +     }
> +
> +   vect_set_operation_type (stmt_info, type, value_precision, sign);
> +   vect_set_min_input_precision (stmt_info, type, value_precision);
> + }
> +
> + /* Use information about the users of STMT's result to decide whether
> +    STMT (described by STMT_INFO) could be done in a narrower type.
> +    This is effectively a backward propagation.  */
> +
> + static void
> + vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
> + {
> +   tree_code code = gimple_assign_rhs_code (stmt);
> +   unsigned int opno = (code == COND_EXPR ? 2 : 1);
> +   tree type = TREE_TYPE (gimple_op (stmt, opno));
> +   if (!vect_narrowable_type_p (type))
> +     return;
> +
> +   unsigned int precision = TYPE_PRECISION (type);
> +   unsigned int operation_precision, min_input_precision;
> +   switch (code)
> +     {
> +     CASE_CONVERT:
> +       /* Only the bits that contribute to the output matter.  Don't change
> +        the precision of the operation itself.  */
> +       operation_precision = precision;
> +       min_input_precision = stmt_info->min_output_precision;
> +       break;
> +
> +     case LSHIFT_EXPR:
> +     case RSHIFT_EXPR:
> +       {
> +       tree shift = gimple_assign_rhs2 (stmt);
> +       if (TREE_CODE (shift) != INTEGER_CST
> +           || !wi::ltu_p (wi::to_widest (shift), precision))
> +         return;
> +       unsigned int const_shift = TREE_INT_CST_LOW (shift);
> +       if (code == LSHIFT_EXPR)
> +         {
> +           /* We need CONST_SHIFT fewer bits of the input.  */
> +           operation_precision = stmt_info->min_output_precision;
> +           min_input_precision = (MAX (operation_precision, const_shift)
> +                                   - const_shift);
> +         }
> +       else
> +         {
> +           /* We need CONST_SHIFT extra bits to do the operation.  */
> +           operation_precision = (stmt_info->min_output_precision
> +                                  + const_shift);
> +           min_input_precision = operation_precision;
> +         }
> +       break;
> +       }
> +
> +     default:
> +       if (vect_truncatable_operation_p (code))
> +       {
> +         /* Input bit N has no effect on output bits N-1 and lower.  */
> +         operation_precision = stmt_info->min_output_precision;
> +         min_input_precision = operation_precision;
> +         break;
> +       }
> +       return;
> +     }
> +
> +   if (operation_precision < precision)
> +     {
> +       if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> +                          " without affecting users: ",
> +                          TYPE_UNSIGNED (type) ? "unsigned" : "signed",
> +                          operation_precision);
> +         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> +       }
> +       vect_set_operation_type (stmt_info, type, operation_precision,
> +                              TYPE_SIGN (type));
> +     }
> +   vect_set_min_input_precision (stmt_info, type, min_input_precision);
> + }
> +
> + /* Handle vect_determine_precisions for STMT_INFO, given that we
> +    have already done so for the users of its result.  */
> +
> + void
> + vect_determine_stmt_precisions (stmt_vec_info stmt_info)
> + {
> +   vect_determine_min_output_precision (stmt_info);
> +   if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
> +     {
> +       vect_determine_precisions_from_range (stmt_info, stmt);
> +       vect_determine_precisions_from_users (stmt_info, stmt);
> +     }
> + }
> +
> + /* Walk backwards through the vectorizable region to determine the
> +    values of these fields:
> +
> +    - min_output_precision
> +    - min_input_precision
> +    - operation_precision
> +    - operation_sign.  */
> +
> + void
> + vect_determine_precisions (vec_info *vinfo)
> + {
> +   DUMP_VECT_SCOPE ("vect_determine_precisions");
> +
> +   if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> +     {
> +       struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +       basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> +       unsigned int nbbs = loop->num_nodes;
> +
> +       for (unsigned int i = 0; i < nbbs; i++)
> +       {
> +         basic_block bb = bbs[nbbs - i - 1];
> +         for (gimple_stmt_iterator si = gsi_last_bb (bb);
> +              !gsi_end_p (si); gsi_prev (&si))
> +           vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
> +       }
> +     }
> +   else
> +     {
> +       bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> +       gimple_stmt_iterator si = bb_vinfo->region_end;
> +       gimple *stmt;
> +       do
> +       {
> +         if (!gsi_stmt (si))
> +           si = gsi_last_bb (bb_vinfo->bb);
> +         else
> +           gsi_prev (&si);
> +         stmt = gsi_stmt (si);
> +         stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +         if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
> +           vect_determine_stmt_precisions (stmt_info);
> +       }
> +       while (stmt != gsi_stmt (bb_vinfo->region_begin));
> +     }
> + }
> +
>   typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
>
>   struct vect_recog_func
> *************** struct vect_recog_func
> *** 4217,4229 ****
>      taken which means usually the more complex one needs to preceed the
>      less comples onex (widen_sum only after dot_prod or sad for example).  */
>   static vect_recog_func vect_vect_recog_func_ptrs[] = {
>     { vect_recog_widen_mult_pattern, "widen_mult" },
>     { vect_recog_dot_prod_pattern, "dot_prod" },
>     { vect_recog_sad_pattern, "sad" },
>     { vect_recog_widen_sum_pattern, "widen_sum" },
>     { vect_recog_pow_pattern, "pow" },
>     { vect_recog_widen_shift_pattern, "widen_shift" },
> -   { vect_recog_over_widening_pattern, "over_widening" },
>     { vect_recog_rotate_pattern, "rotate" },
>     { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>     { vect_recog_divmod_pattern, "divmod" },
> --- 4566,4579 ----
>      taken which means usually the more complex one needs to preceed the
>      less comples onex (widen_sum only after dot_prod or sad for example).  */
>   static vect_recog_func vect_vect_recog_func_ptrs[] = {
> +   { vect_recog_over_widening_pattern, "over_widening" },
> +   { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
>     { vect_recog_widen_mult_pattern, "widen_mult" },
>     { vect_recog_dot_prod_pattern, "dot_prod" },
>     { vect_recog_sad_pattern, "sad" },
>     { vect_recog_widen_sum_pattern, "widen_sum" },
>     { vect_recog_pow_pattern, "pow" },
>     { vect_recog_widen_shift_pattern, "widen_shift" },
>     { vect_recog_rotate_pattern, "rotate" },
>     { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>     { vect_recog_divmod_pattern, "divmod" },
> *************** vect_pattern_recog (vec_info *vinfo)
> *** 4497,4502 ****
> --- 4847,4854 ----
>     unsigned int i, j;
>     auto_vec<gimple *, 1> stmts_to_replace;
>
> +   vect_determine_precisions (vinfo);
> +
>     DUMP_VECT_SCOPE ("vect_pattern_recog");
>
>     if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c       2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c       2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 62,69 ****
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 62,70 ----
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c     2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c     2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 58,64 ****
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 58,66 ----
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c       2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c       2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
>     return 0;
>   }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment.  */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
>     return 0;
>   }
>
> ! /* This is an over-widening even though the final result is still an int.
> !    It's better to do one vector of ops on chars and then widen than to
> !    widen and then do 4 vectors of ops on ints.  */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c     2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c     2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
>     return 0;
>   }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment.  */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
>     return 0;
>   }
>
> ! /* This is an over-widening even though the final result is still an int.
> !    It's better to do one vector of ops on chars and then widen than to
> !    widen and then do 4 vectors of ops on ints.  */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c       2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c       2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,62 ****
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,65 ----
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> ===================================================

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-06-29 12:56 ` Richard Sandiford
  2018-07-02 11:02   ` Christophe Lyon
@ 2018-07-02 13:12   ` Richard Biener
  2018-07-03 10:02     ` Richard Sandiford
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2018-07-02 13:12 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > This patch is the main part of PR85694.  The aim is to recognise at least:
> >
> >   signed char *a, *b, *c;
> >   ...
> >   for (int i = 0; i < 2048; i++)
> >     c[i] = (a[i] + b[i]) >> 1;
> >
> > as an over-widening pattern, since the addition and shift can be done
> > on shorts rather than ints.  However, it ended up being a lot more
> > general than that.
> >
> > The current over-widening pattern detection is limited to a few simple
> > cases: logical ops with immediate second operands, and shifts by a
> > constant.  These cases are enough for common pixel-format conversion
> > and can be detected in a peephole way.
> >
> > The loop above requires two generalisations of the current code: support
> > for addition as well as logical ops, and support for non-constant second
> > operands.  These are harder to detect in the same peephole way, so the
> > patch tries to take a more global approach.
> >
> > The idea is to get information about the minimum operation width
> > in two ways:
> >
> > (1) by using the range information attached to the SSA_NAMEs
> >     (effectively a forward walk, since the range info is
> >     context-independent).
> >
> > (2) by back-propagating the number of output bits required by
> >     users of the result.
> >
> > As explained in the comments, there's a balance to be struck between
> > narrowing an individual operation and fitting in with the surrounding
> > code.  The approach is pretty conservative: if we could narrow an
> > operation to N bits without changing its semantics, it's OK to do that if:
> >
> > - no operations later in the chain require more than N bits; or
> >
> > - all internally-defined inputs are extended from N bits or fewer,
> >   and at least one of them is single-use.
> >
> > See the comments for the rationale.
> >
> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> > since the code seemed more readable without.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Here's a version rebased on top of current trunk.  Changes from last time:
>
> - reintroduce dump_generic_expr_loc, with the obvious change to the
>   prototype
>
> - fix a typo in a comment
>
> - use vect_element_precision from the new version of 12/n.
>
> Tested as before.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-06-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * poly-int.h (print_hex): New function.
>         * dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
>         * dumpfile.c (dump_generic_expr): Fix formatting.
>         (dump_generic_expr_loc): New function.
>         (dump_dec, dump_hex): New poly_wide_int functions.
>         * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
>         min_input_precision, operation_precision and operation_sign.
>         * tree-vect-patterns.c (vect_get_range_info): New function.
>         (vect_same_loop_or_bb_p, vect_single_imm_use)
>         (vect_operation_fits_smaller_type): Delete.
>         (vect_look_through_possible_promotion): Add an optional
>         single_use_p parameter.
>         (vect_recog_over_widening_pattern): Rewrite to use new
>         stmt_vec_info infomration.  Handle one operation at a time.
>         (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
>         (vect_truncatable_operation_p, vect_set_operation_type)
>         (vect_set_min_input_precision): New functions.
>         (vect_determine_min_output_precision_1): Likewise.
>         (vect_determine_min_output_precision): Likewise.
>         (vect_determine_precisions_from_range): Likewise.
>         (vect_determine_precisions_from_users): Likewise.
>         (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
>         (vect_vect_recog_func_ptrs): Put over_widening first.
>         Add cast_forwprop.
>         (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
>         over-widening messages.
>         * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-2.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-3.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-4.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
>         * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
>         * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-5.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-6.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-7.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-8.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-9.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-10.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-11.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-12.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-13.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-14.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-15.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-16.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-17.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-18.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-19.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-20.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-21.c: Likewise.
>
> Index: gcc/poly-int.h
> ===================================================================
> *** gcc/poly-int.h      2018-06-29 12:33:06.000000000 +0100
> --- gcc/poly-int.h      2018-06-29 12:33:06.721263572 +0100
> *************** print_dec (const poly_int_pod<N, C> &val
> *** 2420,2425 ****
> --- 2420,2444 ----
>              poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
>   }
>
> + /* Use print_hex to print VALUE to FILE.  */
> +
> + template<unsigned int N, typename C>
> + void
> + print_hex (const poly_int_pod<N, C> &value, FILE *file)
> + {
> +   if (value.is_constant ())
> +     print_hex (value.coeffs[0], file);
> +   else
> +     {
> +       fprintf (file, "[");
> +       for (unsigned int i = 0; i < N; ++i)
> +       {
> +         print_hex (value.coeffs[i], file);
> +         fputc (i == N - 1 ? ']' : ',', file);
> +       }
> +     }
> + }
> +
>   /* Helper for calculating the distance between two points P1 and P2,
>      in cases where known_le (P1, P2).  T1 and T2 are the types of the
>      two positions, in either order.  The coefficients of P2 - P1 have
> Index: gcc/dumpfile.h
> ===================================================================
> *** gcc/dumpfile.h      2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.h      2018-06-29 12:33:06.717263602 +0100
> *************** extern void dump_printf_loc (dump_flags_
> *** 425,430 ****
> --- 425,432 ----
>                              const char *, ...) ATTRIBUTE_PRINTF_3;
>   extern void dump_function (int phase, tree fn);
>   extern void dump_basic_block (dump_flags_t, basic_block, int);
> + extern void dump_generic_expr_loc (dump_flags_t, const dump_location_t &,
> +                                  dump_flags_t, tree);
>   extern void dump_generic_expr (dump_flags_t, dump_flags_t, tree);
>   extern void dump_gimple_stmt_loc (dump_flags_t, const dump_location_t &,
>                                   dump_flags_t, gimple *, int);
> *************** extern bool enable_rtl_dump_file (void);
> *** 434,439 ****
> --- 436,443 ----
>
>   template<unsigned int N, typename C>
>   void dump_dec (dump_flags_t, const poly_int<N, C> &);
> + extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
> + extern void dump_hex (dump_flags_t, const poly_wide_int &);
>
>   /* In tree-dump.c  */
>   extern void dump_node (const_tree, dump_flags_t, FILE *);
> Index: gcc/dumpfile.c
> ===================================================================
> *** gcc/dumpfile.c      2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.c      2018-06-29 12:33:06.717263602 +0100
> *************** dump_generic_expr (dump_flags_t dump_kin
> *** 498,507 ****
> --- 498,527 ----
>                    tree t)
>   {
>     if (dump_file && (dump_kind & pflags))
> +     print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> +
> +   if (alt_dump_file && (dump_kind & alt_flags))
> +     print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> + }
> +
> + /* Similar to dump_generic_expr, except additionally print source location.  */
> +
> + void
> + dump_generic_expr_loc (dump_flags_t dump_kind, const dump_location_t &loc,
> +                      dump_flags_t extra_dump_flags, tree t)
> + {
> +   location_t srcloc = loc.get_location_t ();
> +   if (dump_file && (dump_kind & pflags))
> +     {
> +       dump_loc (dump_kind, dump_file, srcloc);
>         print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> +     }
>
>     if (alt_dump_file && (dump_kind & alt_flags))
> +     {
> +       dump_loc (dump_kind, alt_dump_file, srcloc);
>         print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> +     }
>   }
>
>   /* Output a formatted message using FORMAT on appropriate dump streams.  */
> *************** template void dump_dec (dump_flags_t, co
> *** 573,578 ****
> --- 593,620 ----
>   template void dump_dec (dump_flags_t, const poly_offset_int &);
>   template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> + void
> + dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
> + {
> +   if (dump_file && (dump_kind & pflags))
> +     print_dec (value, dump_file, sgn);
> +
> +   if (alt_dump_file && (dump_kind & alt_flags))
> +     print_dec (value, alt_dump_file, sgn);
> + }
> +
> + /* Output VALUE in hexadecimal to appropriate dump streams.  */
> +
> + void
> + dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
> + {
> +   if (dump_file && (dump_kind & pflags))
> +     print_hex (value, dump_file);
> +
> +   if (alt_dump_file && (dump_kind & alt_flags))
> +     print_hex (value, alt_dump_file);
> + }
> +
>   /* Start a dump for PHASE. Store user-supplied dump flags in
>      *FLAG_PTR.  Return the number of streams opened.  Set globals
>      DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h       2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vectorizer.h       2018-06-29 12:33:06.725263540 +0100
> *************** typedef struct _stmt_vec_info {
> *** 899,904 ****
> --- 899,919 ----
>
>     /* The number of scalar stmt references from active SLP instances.  */
>     unsigned int num_slp_uses;
> +
> +   /* If nonzero, the lhs of the statement could be truncated to this
> +      many bits without affecting any users of the result.  */
> +   unsigned int min_output_precision;
> +
> +   /* If nonzero, all non-boolean input operands have the same precision,
> +      and they could each be truncated to this many bits without changing
> +      the result.  */
> +   unsigned int min_input_precision;
> +
> +   /* If OPERATION_BITS is nonzero, the statement could be performed on
> +      an integer with the sign and number of bits given by OPERATION_SIGN
> +      and OPERATION_BITS without changing the result.  */
> +   unsigned int operation_precision;
> +   signop operation_sign;
>   } *stmt_vec_info;
>
>   /* Information about a gather/scatter call.  */
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> *** gcc/tree-vect-patterns.c    2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vect-patterns.c    2018-06-29 12:33:06.721263572 +0100
> *************** Software Foundation; either version 3, o
> *** 47,52 ****
> --- 47,86 ----
>   #include "omp-simd-clone.h"
>   #include "predict.h"
>
> + /* Return true if we have a useful VR_RANGE range for VAR, storing it
> +    in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
> +
> + static bool
> + vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> + {
> +   value_range_type vr_type = get_range_info (var, min_value, max_value);
> +   wide_int nonzero = get_nonzero_bits (var);
> +   signop sgn = TYPE_SIGN (TREE_TYPE (var));
> +   if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
> +                                        nonzero, sgn) == VR_RANGE)
> +     {
> +       if (dump_enabled_p ())
> +       {
> +         dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> +         dump_printf (MSG_NOTE, " has range [");
> +         dump_hex (MSG_NOTE, *min_value);
> +         dump_printf (MSG_NOTE, ", ");
> +         dump_hex (MSG_NOTE, *max_value);
> +         dump_printf (MSG_NOTE, "]\n");
> +       }
> +       return true;
> +     }
> +   else
> +     {
> +       if (dump_enabled_p ())
> +       {
> +         dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> +         dump_printf (MSG_NOTE, " has no range info\n");
> +       }
> +       return false;
> +     }
> + }
> +
>   /* Report that we've found an instance of pattern PATTERN in
>      statement STMT.  */
>
> *************** vect_supportable_direct_optab_p (tree ot
> *** 190,229 ****
>     return true;
>   }
>
> - /* Check whether STMT2 is in the same loop or basic block as STMT1.
> -    Which of the two applies depends on whether we're currently doing
> -    loop-based or basic-block-based vectorization, as determined by
> -    the vinfo_for_stmt for STMT1 (which must be defined).
> -
> -    If this returns true, vinfo_for_stmt for STMT2 is guaranteed
> -    to be defined as well.  */
> -
> - static bool
> - vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> - {
> -   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> -   return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> - }
> -
> - /* If the LHS of DEF_STMT has a single use, and that statement is
> -    in the same loop or basic block, return it.  */
> -
> - static gimple *
> - vect_single_imm_use (gimple *def_stmt)
> - {
> -   tree lhs = gimple_assign_lhs (def_stmt);
> -   use_operand_p use_p;
> -   gimple *use_stmt;
> -
> -   if (!single_imm_use (lhs, &use_p, &use_stmt))
> -     return NULL;
> -
> -   if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
> -     return NULL;
> -
> -   return use_stmt;
> - }
> -
>   /* Round bit precision PRECISION up to a full element.  */
>
>   static unsigned int
> --- 224,229 ----
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 347,353 ****
>      is possible to convert OP' back to OP using a possible sign change
>      followed by a possible promotion P.  Return this OP', or null if OP is
>      not a vectorizable SSA name.  If there is a promotion P, describe its
> !    input in UNPROM, otherwise describe OP' in UNPROM.
>
>      A successful return means that it is possible to go from OP' to OP
>      via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
> --- 347,355 ----
>      is possible to convert OP' back to OP using a possible sign change
>      followed by a possible promotion P.  Return this OP', or null if OP is
>      not a vectorizable SSA name.  If there is a promotion P, describe its
> !    input in UNPROM, otherwise describe OP' in UNPROM.  If SINGLE_USE_P
> !    is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
> !    have more than one user.
>
>      A successful return means that it is possible to go from OP' to OP
>      via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 374,380 ****
>
>   static tree
>   vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> !                                     vect_unpromoted_value *unprom)
>   {
>     tree res = NULL_TREE;
>     tree op_type = TREE_TYPE (op);
> --- 376,383 ----
>
>   static tree
>   vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> !                                     vect_unpromoted_value *unprom,
> !                                     bool *single_use_p = NULL)
>   {
>     tree res = NULL_TREE;
>     tree op_type = TREE_TYPE (op);
> *************** vect_look_through_possible_promotion (ve
> *** 420,426 ****
>         if (!def_stmt)
>         break;
>         if (dt == vect_internal_def)
> !       caster = vinfo_for_stmt (def_stmt);
>         else
>         caster = NULL;
>         gassign *assign = dyn_cast <gassign *> (def_stmt);
> --- 423,436 ----
>         if (!def_stmt)
>         break;
>         if (dt == vect_internal_def)
> !       {
> !         caster = vinfo_for_stmt (def_stmt);
> !         /* Ignore pattern statements, since we don't link uses for them.  */
> !         if (single_use_p
> !             && !STMT_VINFO_RELATED_STMT (caster)
> !             && !has_single_use (res))
> !           *single_use_p = false;
> !       }
>         else
>         caster = NULL;
>         gassign *assign = dyn_cast <gassign *> (def_stmt);
> *************** vect_recog_widen_sum_pattern (vec<gimple
> *** 1371,1733 ****
>     return pattern_stmt;
>   }
>
>
> ! /* Return TRUE if the operation in STMT can be performed on a smaller type.
>
> !    Input:
> !    STMT - a statement to check.
> !    DEF - we support operations with two operands, one of which is constant.
> !          The other operand can be defined by a demotion operation, or by a
> !          previous statement in a sequence of over-promoted operations.  In the
> !          later case DEF is used to replace that operand.  (It is defined by a
> !          pattern statement we created for the previous statement in the
> !          sequence).
> !
> !    Input/output:
> !    NEW_TYPE - Output: a smaller type that we are trying to use.  Input: if not
> !          NULL, it's the type of DEF.
> !    STMTS - additional pattern statements.  If a pattern statement (type
> !          conversion) is created in this function, its original statement is
> !          added to STMTS.
>
> !    Output:
> !    OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
> !          operands to use in the new pattern statement for STMT (will be created
> !          in vect_recog_over_widening_pattern ()).
> !    NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
> !          statements for STMT: the first one is a type promotion and the second
> !          one is the operation itself.  We return the type promotion statement
> !        in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
> !          the second pattern statement.  */
>
> ! static bool
> ! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
> !                                 tree *op0, tree *op1, gimple **new_def_stmt,
> !                                 vec<gimple *> *stmts)
> ! {
> !   enum tree_code code;
> !   tree const_oprnd, oprnd;
> !   tree interm_type = NULL_TREE, half_type, new_oprnd, type;
> !   gimple *def_stmt, *new_stmt;
> !   bool first = false;
> !   bool promotion;
>
> !   *op0 = NULL_TREE;
> !   *op1 = NULL_TREE;
> !   *new_def_stmt = NULL;
>
> !   if (!is_gimple_assign (stmt))
> !     return false;
>
> !   code = gimple_assign_rhs_code (stmt);
> !   if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
> !       && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
> !     return false;
>
> !   oprnd = gimple_assign_rhs1 (stmt);
> !   const_oprnd = gimple_assign_rhs2 (stmt);
> !   type = gimple_expr_type (stmt);
>
> !   if (TREE_CODE (oprnd) != SSA_NAME
> !       || TREE_CODE (const_oprnd) != INTEGER_CST)
> !     return false;
>
> !   /* If oprnd has other uses besides that in stmt we cannot mark it
> !      as being part of a pattern only.  */
> !   if (!has_single_use (oprnd))
> !     return false;
>
> !   /* If we are in the middle of a sequence, we use DEF from a previous
> !      statement.  Otherwise, OPRND has to be a result of type promotion.  */
> !   if (*new_type)
> !     {
> !       half_type = *new_type;
> !       oprnd = def;
> !     }
> !   else
>       {
> !       first = true;
> !       if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
> !                             &promotion)
> !         || !promotion
> !         || !vect_same_loop_or_bb_p (stmt, def_stmt))
> !         return false;
>       }
>
> !   /* Can we perform the operation on a smaller type?  */
> !   switch (code)
> !     {
> !       case BIT_IOR_EXPR:
> !       case BIT_XOR_EXPR:
> !       case BIT_AND_EXPR:
> !         if (!int_fits_type_p (const_oprnd, half_type))
> !           {
> !             /* HALF_TYPE is not enough.  Try a bigger type if possible.  */
> !             if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> !               return false;
> !
> !             interm_type = build_nonstandard_integer_type (
> !                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !             if (!int_fits_type_p (const_oprnd, interm_type))
> !               return false;
> !           }
> !
> !         break;
> !
> !       case LSHIFT_EXPR:
> !         /* Try intermediate type - HALF_TYPE is not enough for sure.  */
> !         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> !           return false;
> !
> !         /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
> !           (e.g., if the original value was char, the shift amount is at most 8
> !            if we want to use short).  */
> !         if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
> !           return false;
> !
> !         interm_type = build_nonstandard_integer_type (
> !                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> !         if (!vect_supportable_shift (code, interm_type))
> !           return false;
> !
> !         break;
> !
> !       case RSHIFT_EXPR:
> !         if (vect_supportable_shift (code, half_type))
> !           break;
> !
> !         /* Try intermediate type - HALF_TYPE is not supported.  */
> !         if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> !           return false;
> !
> !         interm_type = build_nonstandard_integer_type (
> !                         TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> !         if (!vect_supportable_shift (code, interm_type))
> !           return false;
> !
> !         break;
> !
> !       default:
> !         gcc_unreachable ();
> !     }
> !
> !   /* There are four possible cases:
> !      1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
> !         the first statement in the sequence)
> !         a. The original, HALF_TYPE, is not enough - we replace the promotion
> !            from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
> !         b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
> !            promotion.
> !      2. OPRND is defined by a pattern statement we created.
> !         a. Its type is not sufficient for the operation, we create a new stmt:
> !            a type conversion for OPRND from HALF_TYPE to INTERM_TYPE.  We store
> !            this statement in NEW_DEF_STMT, and it is later put in
> !          STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
> !         b. OPRND is good to use in the new statement.  */
> !   if (first)
> !     {
> !       if (interm_type)
> !         {
> !           /* Replace the original type conversion HALF_TYPE->TYPE with
> !              HALF_TYPE->INTERM_TYPE.  */
> !           if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
> !             {
> !               new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
> !               /* Check if the already created pattern stmt is what we need.  */
> !               if (!is_gimple_assign (new_stmt)
> !                   || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
> !                   || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
> !                 return false;
> !
> !             stmts->safe_push (def_stmt);
> !               oprnd = gimple_assign_lhs (new_stmt);
> !             }
> !           else
> !             {
> !               /* Create NEW_OPRND = (INTERM_TYPE) OPRND.  */
> !               oprnd = gimple_assign_rhs1 (def_stmt);
> !             new_oprnd = make_ssa_name (interm_type);
> !             new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> !               STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
> !               stmts->safe_push (def_stmt);
> !               oprnd = new_oprnd;
> !             }
> !         }
> !       else
> !         {
> !           /* Retrieve the operand before the type promotion.  */
> !           oprnd = gimple_assign_rhs1 (def_stmt);
> !         }
> !     }
> !   else
> !     {
> !       if (interm_type)
> !         {
> !           /* Create a type conversion HALF_TYPE->INTERM_TYPE.  */
> !         new_oprnd = make_ssa_name (interm_type);
> !         new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> !           oprnd = new_oprnd;
> !           *new_def_stmt = new_stmt;
> !         }
>
> !       /* Otherwise, OPRND is already set.  */
>       }
>
> !   if (interm_type)
> !     *new_type = interm_type;
> !   else
> !     *new_type = half_type;
>
> !   *op0 = oprnd;
> !   *op1 = fold_convert (*new_type, const_oprnd);
> !
> !   return true;
>   }
>
>
> ! /* Try to find a statement or a sequence of statements that can be performed
> !    on a smaller type:
>
> !      type x_t;
> !      TYPE x_T, res0_T, res1_T;
> !    loop:
> !      S1  x_t = *p;
> !      S2  x_T = (TYPE) x_t;
> !      S3  res0_T = op (x_T, C0);
> !      S4  res1_T = op (res0_T, C1);
> !      S5  ... = () res1_T;  - type demotion
> !
> !    where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
> !    constants.
> !    Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
> !    be 'type' or some intermediate type.  For now, we expect S5 to be a type
> !    demotion operation.  We also check that S3 and S4 have only one use.  */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> !   gimple *stmt = stmts->pop ();
> !   gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
> !        *use_stmt = NULL;
> !   tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
> !   tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
> !   bool first;
> !   tree type = NULL;
> !
> !   first = true;
> !   while (1)
> !     {
> !       if (!vinfo_for_stmt (stmt)
> !           || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
> !         return NULL;
> !
> !       new_def_stmt = NULL;
> !       if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
> !                                              &op0, &op1, &new_def_stmt,
> !                                              stmts))
> !         {
> !           if (first)
> !             return NULL;
> !           else
> !             break;
> !         }
>
> !       /* STMT can be performed on a smaller type.  Check its uses.  */
> !       use_stmt = vect_single_imm_use (stmt);
> !       if (!use_stmt || !is_gimple_assign (use_stmt))
> !         return NULL;
> !
> !       /* Create pattern statement for STMT.  */
> !       vectype = get_vectype_for_scalar_type (new_type);
> !       if (!vectype)
> !         return NULL;
> !
> !       /* We want to collect all the statements for which we create pattern
> !          statetments, except for the case when the last statement in the
> !          sequence doesn't have a corresponding pattern statement.  In such
> !          case we associate the last pattern statement with the last statement
> !          in the sequence.  Therefore, we only add the original statement to
> !          the list if we know that it is not the last.  */
> !       if (prev_stmt)
> !         stmts->safe_push (prev_stmt);
>
> !       var = vect_recog_temp_ssa_var (new_type, NULL);
> !       pattern_stmt
> !       = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
> !       STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
> !       new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
>
> !       if (dump_enabled_p ())
> !         {
> !           dump_printf_loc (MSG_NOTE, vect_location,
> !                            "created pattern stmt: ");
> !           dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> !         }
>
> !       type = gimple_expr_type (stmt);
> !       prev_stmt = stmt;
> !       stmt = use_stmt;
> !
> !       first = false;
> !     }
> !
> !   /* We got a sequence.  We expect it to end with a type demotion operation.
> !      Otherwise, we quit (for now).  There are three possible cases: the
> !      conversion is to NEW_TYPE (we don't do anything), the conversion is to
> !      a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
> !      NEW_TYPE differs (we create a new conversion statement).  */
> !   if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
> !     {
> !       use_lhs = gimple_assign_lhs (use_stmt);
> !       use_type = TREE_TYPE (use_lhs);
> !       /* Support only type demotion or signedess change.  */
> !       if (!INTEGRAL_TYPE_P (use_type)
> !         || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
> !         return NULL;
>
> !       /* Check that NEW_TYPE is not bigger than the conversion result.  */
> !       if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
> !       return NULL;
>
> !       if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
> !           || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
> !         {
> !         *type_out = get_vectype_for_scalar_type (use_type);
> !         if (!*type_out)
> !           return NULL;
>
> !           /* Create NEW_TYPE->USE_TYPE conversion.  */
> !         new_oprnd = make_ssa_name (use_type);
> !         pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
> !           STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
> !
> !           /* We created a pattern statement for the last statement in the
> !              sequence, so we don't need to associate it with the pattern
> !              statement created for PREV_STMT.  Therefore, we add PREV_STMT
> !              to the list in order to mark it later in vect_pattern_recog_1.  */
> !           if (prev_stmt)
> !             stmts->safe_push (prev_stmt);
> !         }
> !       else
> !         {
> !           if (prev_stmt)
> !           STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
> !              = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
>
> !         *type_out = vectype;
> !         }
>
> !       stmts->safe_push (use_stmt);
> !     }
> !   else
> !     /* TODO: support general case, create a conversion to the correct type.  */
>       return NULL;
>
> !   /* Pattern detected.  */
> !   vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
>
>     return pattern_stmt;
>   }
>
> --- 1381,1698 ----
>     return pattern_stmt;
>   }
>
> + /* Recognize cases in which an operation is performed in one type WTYPE
> +    but could be done more efficiently in a narrower type NTYPE.  For example,
> +    if we have:
> +
> +      ATYPE a;  // narrower than NTYPE
> +      BTYPE b;  // narrower than NTYPE
> +      WTYPE aw = (WTYPE) a;
> +      WTYPE bw = (WTYPE) b;
> +      WTYPE res = aw + bw;  // only uses of aw and bw
> +
> +    then it would be more efficient to do:
> +
> +      NTYPE an = (NTYPE) a;
> +      NTYPE bn = (NTYPE) b;
> +      NTYPE resn = an + bn;
> +      WTYPE res = (WTYPE) resn;
> +
> +    Other situations include things like:
> +
> +      ATYPE a;  // NTYPE or narrower
> +      WTYPE aw = (WTYPE) a;
> +      WTYPE res = aw + b;
> +
> +    when only "(NTYPE) res" is significant.  In that case it's more efficient
> +    to truncate "b" and do the operation on NTYPE instead:
> +
> +      NTYPE an = (NTYPE) a;
> +      NTYPE bn = (NTYPE) b;  // truncation
> +      NTYPE resn = an + bn;
> +      WTYPE res = (WTYPE) resn;
> +
> +    All users of "res" should then use "resn" instead, making the final
> +    statement dead (not marked as relevant).  The final statement is still
> +    needed to maintain the type correctness of the IR.
> +
> +    vect_determine_precisions has already determined the minimum
> +    precison of the operation and the minimum precision required
> +    by users of the result.  */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> !   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> !   if (!last_stmt)
> !     return NULL;
>
> !   /* See whether we have found that this operation can be done on a
> !      narrower type without changing its semantics.  */
> !   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> !   unsigned int new_precision = last_stmt_info->operation_precision;
> !   if (!new_precision)
> !     return NULL;
>
> !   vec_info *vinfo = last_stmt_info->vinfo;
> !   tree lhs = gimple_assign_lhs (last_stmt);
> !   tree type = TREE_TYPE (lhs);
> !   tree_code code = gimple_assign_rhs_code (last_stmt);
> !
> !   /* Keep the first operand of a COND_EXPR as-is: only the other two
> !      operands are interesting.  */
> !   unsigned int first_op = (code == COND_EXPR ? 2 : 1);
>
> !   /* Check the operands.  */
> !   unsigned int nops = gimple_num_ops (last_stmt) - first_op;
> !   auto_vec <vect_unpromoted_value, 3> unprom (nops);
> !   unprom.quick_grow (nops);
> !   unsigned int min_precision = 0;
> !   bool single_use_p = false;
> !   for (unsigned int i = 0; i < nops; ++i)
> !     {
> !       tree op = gimple_op (last_stmt, first_op + i);
> !       if (TREE_CODE (op) == INTEGER_CST)
> !       unprom[i].set_op (op, vect_constant_def);
> !       else if (TREE_CODE (op) == SSA_NAME)
> !       {
> !         bool op_single_use_p = true;
> !         if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
> !                                                    &op_single_use_p))
> !           return NULL;
> !         /* If:
>
> !            (1) N bits of the result are needed;
> !            (2) all inputs are widened from M<N bits; and
> !            (3) one operand OP is a single-use SSA name
> !
> !            we can shift the M->N widening from OP to the output
> !            without changing the number or type of extensions involved.
> !            This then reduces the number of copies of STMT_INFO.
> !
> !            If instead of (3) more than one operand is a single-use SSA name,
> !            shifting the extension to the output is even more of a win.
> !
> !            If instead:
> !
> !            (1) N bits of the result are needed;
> !            (2) one operand OP2 is widened from M2<N bits;
> !            (3) another operand OP1 is widened from M1<M2 bits; and
> !            (4) both OP1 and OP2 are single-use
> !
> !            the choice is between:
> !
> !            (a) truncating OP2 to M1, doing the operation on M1,
> !                and then widening the result to N
> !
> !            (b) widening OP1 to M2, doing the operation on M2, and then
> !                widening the result to N
> !
> !            Both shift the M2->N widening of the inputs to the output.
> !            (a) additionally shifts the M1->M2 widening to the output;
> !            it requires fewer copies of STMT_INFO but requires an extra
> !            M2->M1 truncation.
> !
> !            Which is better will depend on the complexity and cost of
> !            STMT_INFO, which is hard to predict at this stage.  However,
> !            a clear tie-breaker in favor of (b) is the fact that the
> !            truncation in (a) increases the length of the operation chain.
> !
> !            If instead of (4) only one of OP1 or OP2 is single-use,
> !            (b) is still a win over doing the operation in N bits:
> !            it still shifts the M2->N widening on the single-use operand
> !            to the output and reduces the number of STMT_INFO copies.
> !
> !            If neither operand is single-use then operating on fewer than
> !            N bits might lead to more extensions overall.  Whether it does
> !            or not depends on global information about the vectorization
> !            region, and whether that's a good trade-off would again
> !            depend on the complexity and cost of the statements involved,
> !            as well as things like register pressure that are not normally
> !            modelled at this stage.  We therefore ignore these cases
> !            and just optimize the clear single-use wins above.
> !
> !            Thus we take the maximum precision of the unpromoted operands
> !            and record whether any operand is single-use.  */
> !         if (unprom[i].dt == vect_internal_def)
> !           {
> !             min_precision = MAX (min_precision,
> !                                  TYPE_PRECISION (unprom[i].type));
> !             single_use_p |= op_single_use_p;
> !           }
> !       }
> !     }
>
> !   /* Although the operation could be done in operation_precision, we have
> !      to balance that against introducing extra truncations or extensions.
> !      Calculate the minimum precision that can be handled efficiently.
> !
> !      The loop above determined that the operation could be handled
> !      efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
> !      extension from the inputs to the output without introducing more
> !      instructions, and would reduce the number of instructions required
> !      for STMT_INFO itself.
> !
> !      vect_determine_precisions has also determined that the result only
> !      needs min_output_precision bits.  Truncating by a factor of N times
> !      requires a tree of N - 1 instructions, so if TYPE is N times wider
> !      than min_output_precision, doing the operation in TYPE and truncating
> !      the result requires N + (N - 1) = 2N - 1 instructions per output vector.
> !      In contrast:
> !
> !      - truncating the input to a unary operation and doing the operation
> !        in the new type requires at most N - 1 + 1 = N instructions per
> !        output vector
> !
> !      - doing the same for a binary operation requires at most
> !        (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
> !
> !      Both unary and binary operations require fewer instructions than
> !      this if the operands were extended from a suitable truncated form.
> !      Thus there is usually nothing to lose by doing operations in
> !      min_output_precision bits, but there can be something to gain.  */
> !   if (!single_use_p)
> !     min_precision = last_stmt_info->min_output_precision;
> !   else
> !     min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
>
> !   /* Apply the minimum efficient precision we just calculated.  */
> !   if (new_precision < min_precision)
> !     new_precision = min_precision;
> !   if (new_precision >= TYPE_PRECISION (type))
> !     return NULL;
>
> !   vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
>
> !   *type_out = get_vectype_for_scalar_type (type);
> !   if (!*type_out)
> !     return NULL;
>
> !   /* We've found a viable pattern.  Get the new type of the operation.  */
> !   bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
> !   tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
> !
> !   /* We specifically don't check here whether the target supports the
> !      new operation, since it might be something that a later pattern
> !      wants to rewrite anyway.  If targets have a minimum element size
> !      for some optabs, we should pattern-match smaller ops to larger ops
> !      where beneficial.  */
> !   tree new_vectype = get_vectype_for_scalar_type (new_type);
> !   if (!new_vectype)
> !     return NULL;
>
> !   if (dump_enabled_p ())
>       {
> !       dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
> !       dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
> !       dump_printf (MSG_NOTE, " to ");
> !       dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
> !       dump_printf (MSG_NOTE, "\n");
>       }
>
> !   /* Calculate the rhs operands for an operation on NEW_TYPE.  */
> !   STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
> !   tree ops[3] = {};
> !   for (unsigned int i = 1; i < first_op; ++i)
> !     ops[i - 1] = gimple_op (last_stmt, i);
> !   vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
> !                      new_type, &unprom[0], new_vectype);
> !
> !   /* Use the operation to produce a result of type NEW_TYPE.  */
> !   tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> !   gimple *pattern_stmt = gimple_build_assign (new_var, code,
> !                                             ops[0], ops[1], ops[2]);
> !   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> !   if (dump_enabled_p ())
> !     {
> !       dump_printf_loc (MSG_NOTE, vect_location,
> !                      "created pattern stmt: ");
> !       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
>       }
>
> !   pattern_stmt = vect_convert_output (last_stmt_info, type,
> !                                     pattern_stmt, new_vectype);
>
> !   stmts->safe_push (last_stmt);
> !   return pattern_stmt;
>   }
>
> + /* Recognize cases in which the input to a cast is wider than its
> +    output, and the input is fed by a widening operation.  Fold this
> +    by removing the unnecessary intermediate widening.  E.g.:
>
> !      unsigned char a;
> !      unsigned int b = (unsigned int) a;
> !      unsigned short c = (unsigned short) b;
>
> !    -->
>
> !      unsigned short c = (unsigned short) a;
>
> !    Although this is rare in input IR, it is an expected side-effect
> !    of the over-widening pattern above.
>
> !    This is beneficial also for integer-to-float conversions, if the
> !    widened integer has more bits than the float, and if the unwidened
> !    input doesn't.  */
>
> ! static gimple *
> ! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> !   /* Check for a cast, including an integer-to-float conversion.  */
> !   gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> !   if (!last_stmt)
> !     return NULL;
> !   tree_code code = gimple_assign_rhs_code (last_stmt);
> !   if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
> !     return NULL;
>
> !   /* Make sure that the rhs is a scalar with a natural bitsize.  */
> !   tree lhs = gimple_assign_lhs (last_stmt);
> !   if (!lhs)
> !     return NULL;
> !   tree lhs_type = TREE_TYPE (lhs);
> !   scalar_mode lhs_mode;
> !   if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
> !       || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
> !     return NULL;
>
> !   /* Check for a narrowing operation (from a vector point of view).  */
> !   tree rhs = gimple_assign_rhs1 (last_stmt);
> !   tree rhs_type = TREE_TYPE (rhs);
> !   if (!INTEGRAL_TYPE_P (rhs_type)
> !       || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
> !       || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
> !     return NULL;
>
> !   /* Try to find an unpromoted input.  */
> !   stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> !   vec_info *vinfo = last_stmt_info->vinfo;
> !   vect_unpromoted_value unprom;
> !   if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
> !       || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
> !     return NULL;
>
> !   /* If the bits above RHS_TYPE matter, make sure that they're the
> !      same when extending from UNPROM as they are when extending from RHS.  */
> !   if (!INTEGRAL_TYPE_P (lhs_type)
> !       && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
> !     return NULL;
>
> !   /* We can get the same result by casting UNPROM directly, to avoid
> !      the unnecessary widening and narrowing.  */
> !   vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
>
> !   *type_out = get_vectype_for_scalar_type (lhs_type);
> !   if (!*type_out)
>       return NULL;
>
> !   tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> !   gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
> !   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> +   stmts->safe_push (last_stmt);
>     return pattern_stmt;
>   }
>
> *************** vect_recog_gather_scatter_pattern (vec<g
> *** 4205,4210 ****
> --- 4170,4559 ----
>     return pattern_stmt;
>   }
>
> + /* Return true if TYPE is a non-boolean integer type.  These are the types
> +    that we want to consider for narrowing.  */
> +
> + static bool
> + vect_narrowable_type_p (tree type)
> + {
> +   return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
> + }
> +
> + /* Return true if the operation given by CODE can be truncated to N bits
> +    when only N bits of the output are needed.  This is only true if bit N+1
> +    of the inputs has no effect on the low N bits of the result.  */
> +
> + static bool
> + vect_truncatable_operation_p (tree_code code)
> + {
> +   switch (code)
> +     {
> +     case PLUS_EXPR:
> +     case MINUS_EXPR:
> +     case MULT_EXPR:
> +     case BIT_AND_EXPR:
> +     case BIT_IOR_EXPR:
> +     case BIT_XOR_EXPR:
> +     case COND_EXPR:
> +       return true;
> +
> +     default:
> +       return false;
> +     }
> + }
> +
> + /* Record that STMT_INFO could be changed from operating on TYPE to
> +    operating on a type with the precision and sign given by PRECISION
> +    and SIGN respectively.  PRECISION is an arbitrary bit precision;
> +    it might not be a whole number of bytes.  */
> +
> + static void
> + vect_set_operation_type (stmt_vec_info stmt_info, tree type,
> +                        unsigned int precision, signop sign)
> + {
> +   /* Round the precision up to a whole number of bytes.  */
> +   precision = vect_element_precision (precision);
> +   if (precision < TYPE_PRECISION (type)
> +       && (!stmt_info->operation_precision
> +         || stmt_info->operation_precision > precision))
> +     {
> +       stmt_info->operation_precision = precision;
> +       stmt_info->operation_sign = sign;
> +     }
> + }
> +
> + /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
> +    non-boolean inputs, all of which have type TYPE.  MIN_INPUT_PRECISION
> +    is an arbitrary bit precision; it might not be a whole number of bytes.  */
> +
> + static void
> + vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
> +                             unsigned int min_input_precision)
> + {
> +   /* This operation in isolation only requires the inputs to have
> +      MIN_INPUT_PRECISION of precision,  However, that doesn't mean
> +      that MIN_INPUT_PRECISION is a natural precision for the chain
> +      as a whole.  E.g. consider something like:
> +
> +        unsigned short *x, *y;
> +        *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> +      The right shift can be done on unsigned chars, and only requires the
> +      result of "*x & 0xf0" to be done on unsigned chars.  But taking that
> +      approach would mean turning a natural chain of single-vector unsigned
> +      short operations into one that truncates "*x" and then extends
> +      "(*x & 0xf0) >> 4", with two vectors for each unsigned short
> +      operation and one vector for each unsigned char operation.
> +      This would be a significant pessimization.
> +
> +      Instead only propagate the maximum of this precision and the precision
> +      required by the users of the result.  This means that we don't pessimize
> +      the case above but continue to optimize things like:
> +
> +        unsigned char *y;
> +        unsigned short *x;
> +        *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> +      Here we would truncate two vectors of *x to a single vector of
> +      unsigned chars and use single-vector unsigned char operations for
> +      everything else, rather than doing two unsigned short copies of
> +      "(*x & 0xf0) >> 4" and then truncating the result.  */
> +   min_input_precision = MAX (min_input_precision,
> +                            stmt_info->min_output_precision);
> +
> +   if (min_input_precision < TYPE_PRECISION (type)
> +       && (!stmt_info->min_input_precision
> +         || stmt_info->min_input_precision > min_input_precision))
> +     stmt_info->min_input_precision = min_input_precision;
> + }
> +
> + /* Subroutine of vect_determine_min_output_precision.  Return true if
> +    we can calculate a reduced number of output bits for STMT_INFO,
> +    whose result is LHS.  */
> +
> + static bool
> + vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
> + {
> +   /* Take the maximum precision required by users of the result.  */
> +   unsigned int precision = 0;
> +   imm_use_iterator iter;
> +   use_operand_p use;
> +   FOR_EACH_IMM_USE_FAST (use, iter, lhs)
> +     {
> +       gimple *use_stmt = USE_STMT (use);
> +       if (is_gimple_debug (use_stmt))
> +       continue;
> +       if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
> +       return false;
> +       stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
> +       if (!use_stmt_info->min_input_precision)
> +       return false;
> +       precision = MAX (precision, use_stmt_info->min_input_precision);
> +     }
> +
> +   if (dump_enabled_p ())
> +     {
> +       dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
> +                      precision);
> +       dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
> +       dump_printf (MSG_NOTE, " are significant\n");
> +     }
> +   stmt_info->min_output_precision = precision;
> +   return true;
> + }
> +
> + /* Calculate min_output_precision for STMT_INFO.  */
> +
> + static void
> + vect_determine_min_output_precision (stmt_vec_info stmt_info)
> + {
> +   /* We're only interested in statements with a narrowable result.  */
> +   tree lhs = gimple_get_lhs (stmt_info->stmt);
> +   if (!lhs
> +       || TREE_CODE (lhs) != SSA_NAME
> +       || !vect_narrowable_type_p (TREE_TYPE (lhs)))
> +     return;
> +
> +   if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
> +     stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
> + }
> +
> + /* Use range information to decide whether STMT (described by STMT_INFO)
> +    could be done in a narrower type.  This is effectively a forward
> +    propagation, since it uses context-independent information that applies
> +    to all users of an SSA name.  */
> +
> + static void
> + vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
> + {
> +   tree lhs = gimple_assign_lhs (stmt);
> +   if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> +     return;
> +
> +   tree type = TREE_TYPE (lhs);
> +   if (!vect_narrowable_type_p (type))
> +     return;
> +
> +   /* First see whether we have any useful range information for the result.  */
> +   unsigned int precision = TYPE_PRECISION (type);
> +   signop sign = TYPE_SIGN (type);
> +   wide_int min_value, max_value;
> +   if (!vect_get_range_info (lhs, &min_value, &max_value))
> +     return;
> +
> +   tree_code code = gimple_assign_rhs_code (stmt);
> +   unsigned int nops = gimple_num_ops (stmt);
> +
> +   if (!vect_truncatable_operation_p (code))
> +     /* Check that all relevant input operands are compatible, and update
> +        [MIN_VALUE, MAX_VALUE] to include their ranges.  */
> +     for (unsigned int i = 1; i < nops; ++i)
> +       {
> +       tree op = gimple_op (stmt, i);
> +       if (TREE_CODE (op) == INTEGER_CST)
> +         {
> +           /* Don't require the integer to have RHS_TYPE (which it might
> +              not for things like shift amounts, etc.), but do require it
> +              to fit the type.  */
> +           if (!int_fits_type_p (op, type))
> +             return;
> +
> +           min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
> +           max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
> +         }
> +       else if (TREE_CODE (op) == SSA_NAME)
> +         {
> +           /* Ignore codes that don't take uniform arguments.  */
> +           if (!types_compatible_p (TREE_TYPE (op), type))
> +             return;
> +
> +           wide_int op_min_value, op_max_value;
> +           if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> +             return;
> +
> +           min_value = wi::min (min_value, op_min_value, sign);
> +           max_value = wi::max (max_value, op_max_value, sign);
> +         }
> +       else
> +         return;
> +       }
> +
> +   /* Try to switch signed types for unsigned types if we can.
> +      This is better for two reasons.  First, unsigned ops tend
> +      to be cheaper than signed ops.  Second, it means that we can
> +      handle things like:
> +
> +       signed char c;
> +       int res = (int) c & 0xff00; // range [0x0000, 0xff00]
> +
> +      as:
> +
> +       signed char c;
> +       unsigned short res_1 = (unsigned short) c & 0xff00;
> +       int res = (int) res_1;
> +
> +      where the intermediate result res_1 has unsigned rather than
> +      signed type.  */
> +   if (sign == SIGNED && !wi::neg_p (min_value))
> +     sign = UNSIGNED;
> +
> +   /* See what precision is required for MIN_VALUE and MAX_VALUE.  */
> +   unsigned int precision1 = wi::min_precision (min_value, sign);
> +   unsigned int precision2 = wi::min_precision (max_value, sign);
> +   unsigned int value_precision = MAX (precision1, precision2);
> +   if (value_precision >= precision)
> +     return;
> +
> +   if (dump_enabled_p ())
> +     {
> +       dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> +                      " without loss of precision: ",
> +                      sign == SIGNED ? "signed" : "unsigned",
> +                      value_precision);
> +       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> +     }
> +
> +   vect_set_operation_type (stmt_info, type, value_precision, sign);
> +   vect_set_min_input_precision (stmt_info, type, value_precision);
> + }
> +
> + /* Use information about the users of STMT's result to decide whether
> +    STMT (described by STMT_INFO) could be done in a narrower type.
> +    This is effectively a backward propagation.  */
> +
> + static void
> + vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
> + {
> +   tree_code code = gimple_assign_rhs_code (stmt);
> +   unsigned int opno = (code == COND_EXPR ? 2 : 1);
> +   tree type = TREE_TYPE (gimple_op (stmt, opno));
> +   if (!vect_narrowable_type_p (type))
> +     return;
> +
> +   unsigned int precision = TYPE_PRECISION (type);
> +   unsigned int operation_precision, min_input_precision;
> +   switch (code)
> +     {
> +     CASE_CONVERT:
> +       /* Only the bits that contribute to the output matter.  Don't change
> +        the precision of the operation itself.  */
> +       operation_precision = precision;
> +       min_input_precision = stmt_info->min_output_precision;
> +       break;
> +
> +     case LSHIFT_EXPR:
> +     case RSHIFT_EXPR:
> +       {
> +       tree shift = gimple_assign_rhs2 (stmt);
> +       if (TREE_CODE (shift) != INTEGER_CST
> +           || !wi::ltu_p (wi::to_widest (shift), precision))
> +         return;
> +       unsigned int const_shift = TREE_INT_CST_LOW (shift);
> +       if (code == LSHIFT_EXPR)
> +         {
> +           /* We need CONST_SHIFT fewer bits of the input.  */
> +           operation_precision = stmt_info->min_output_precision;
> +           min_input_precision = (MAX (operation_precision, const_shift)
> +                                   - const_shift);
> +         }
> +       else
> +         {
> +           /* We need CONST_SHIFT extra bits to do the operation.  */
> +           operation_precision = (stmt_info->min_output_precision
> +                                  + const_shift);
> +           min_input_precision = operation_precision;
> +         }
> +       break;
> +       }
> +
> +     default:
> +       if (vect_truncatable_operation_p (code))
> +       {
> +         /* Input bit N has no effect on output bits N-1 and lower.  */
> +         operation_precision = stmt_info->min_output_precision;
> +         min_input_precision = operation_precision;
> +         break;
> +       }
> +       return;
> +     }
> +
> +   if (operation_precision < precision)
> +     {
> +       if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> +                          " without affecting users: ",
> +                          TYPE_UNSIGNED (type) ? "unsigned" : "signed",
> +                          operation_precision);
> +         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> +       }
> +       vect_set_operation_type (stmt_info, type, operation_precision,
> +                              TYPE_SIGN (type));
> +     }
> +   vect_set_min_input_precision (stmt_info, type, min_input_precision);
> + }
> +
> + /* Handle vect_determine_precisions for STMT_INFO, given that we
> +    have already done so for the users of its result.  */
> +
> + void
> + vect_determine_stmt_precisions (stmt_vec_info stmt_info)
> + {
> +   vect_determine_min_output_precision (stmt_info);
> +   if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
> +     {
> +       vect_determine_precisions_from_range (stmt_info, stmt);
> +       vect_determine_precisions_from_users (stmt_info, stmt);
> +     }
> + }
> +
> + /* Walk backwards through the vectorizable region to determine the
> +    values of these fields:
> +
> +    - min_output_precision
> +    - min_input_precision
> +    - operation_precision
> +    - operation_sign.  */
> +
> + void
> + vect_determine_precisions (vec_info *vinfo)
> + {
> +   DUMP_VECT_SCOPE ("vect_determine_precisions");
> +
> +   if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> +     {
> +       struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +       basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> +       unsigned int nbbs = loop->num_nodes;
> +
> +       for (unsigned int i = 0; i < nbbs; i++)
> +       {
> +         basic_block bb = bbs[nbbs - i - 1];
> +         for (gimple_stmt_iterator si = gsi_last_bb (bb);
> +              !gsi_end_p (si); gsi_prev (&si))
> +           vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
> +       }
> +     }
> +   else
> +     {
> +       bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> +       gimple_stmt_iterator si = bb_vinfo->region_end;
> +       gimple *stmt;
> +       do
> +       {
> +         if (!gsi_stmt (si))
> +           si = gsi_last_bb (bb_vinfo->bb);
> +         else
> +           gsi_prev (&si);
> +         stmt = gsi_stmt (si);
> +         stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +         if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
> +           vect_determine_stmt_precisions (stmt_info);
> +       }
> +       while (stmt != gsi_stmt (bb_vinfo->region_begin));
> +     }
> + }
> +
>   typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
>
>   struct vect_recog_func
> *************** struct vect_recog_func
> *** 4217,4229 ****
>      taken which means usually the more complex one needs to preceed the
>      less comples onex (widen_sum only after dot_prod or sad for example).  */
>   static vect_recog_func vect_vect_recog_func_ptrs[] = {
>     { vect_recog_widen_mult_pattern, "widen_mult" },
>     { vect_recog_dot_prod_pattern, "dot_prod" },
>     { vect_recog_sad_pattern, "sad" },
>     { vect_recog_widen_sum_pattern, "widen_sum" },
>     { vect_recog_pow_pattern, "pow" },
>     { vect_recog_widen_shift_pattern, "widen_shift" },
> -   { vect_recog_over_widening_pattern, "over_widening" },
>     { vect_recog_rotate_pattern, "rotate" },
>     { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>     { vect_recog_divmod_pattern, "divmod" },
> --- 4566,4579 ----
>      taken which means usually the more complex one needs to preceed the
>      less comples onex (widen_sum only after dot_prod or sad for example).  */
>   static vect_recog_func vect_vect_recog_func_ptrs[] = {
> +   { vect_recog_over_widening_pattern, "over_widening" },
> +   { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
>     { vect_recog_widen_mult_pattern, "widen_mult" },
>     { vect_recog_dot_prod_pattern, "dot_prod" },
>     { vect_recog_sad_pattern, "sad" },
>     { vect_recog_widen_sum_pattern, "widen_sum" },
>     { vect_recog_pow_pattern, "pow" },
>     { vect_recog_widen_shift_pattern, "widen_shift" },
>     { vect_recog_rotate_pattern, "rotate" },
>     { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>     { vect_recog_divmod_pattern, "divmod" },
> *************** vect_pattern_recog (vec_info *vinfo)
> *** 4497,4502 ****
> --- 4847,4854 ----
>     unsigned int i, j;
>     auto_vec<gimple *, 1> stmts_to_replace;
>
> +   vect_determine_precisions (vinfo);
> +
>     DUMP_VECT_SCOPE ("vect_pattern_recog");
>
>     if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c       2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c       2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 62,69 ****
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 62,70 ----
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c     2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c     2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 58,64 ****
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 58,66 ----
>   }
>
>   /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c       2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c       2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
>     return 0;
>   }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment.  */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
>     return 0;
>   }
>
> ! /* This is an over-widening even though the final result is still an int.
> !    It's better to do one vector of ops on chars and then widen than to
> !    widen and then do 4 vectors of ops on ints.  */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c     2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c     2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
>     return 0;
>   }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment.  */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
>     return 0;
>   }
>
> ! /* This is an over-widening even though the final result is still an int.
> !    It's better to do one vector of ops on chars and then widen than to
> !    widen and then do 4 vectors of ops on ints.  */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c       2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c       2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,62 ****
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,65 ----
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
>   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> ===================================================

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-07-02 11:02   ` Christophe Lyon
@ 2018-07-02 13:37     ` Richard Sandiford
  2018-07-02 13:52       ` Christophe Lyon
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2018-07-02 13:37 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches

Christophe Lyon <christophe.lyon@linaro.org> writes:
> On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > This patch is the main part of PR85694.  The aim is to recognise at least:
>> >
>> >   signed char *a, *b, *c;
>> >   ...
>> >   for (int i = 0; i < 2048; i++)
>> >     c[i] = (a[i] + b[i]) >> 1;
>> >
>> > as an over-widening pattern, since the addition and shift can be done
>> > on shorts rather than ints.  However, it ended up being a lot more
>> > general than that.
>> >
>> > The current over-widening pattern detection is limited to a few simple
>> > cases: logical ops with immediate second operands, and shifts by a
>> > constant.  These cases are enough for common pixel-format conversion
>> > and can be detected in a peephole way.
>> >
>> > The loop above requires two generalisations of the current code: support
>> > for addition as well as logical ops, and support for non-constant second
>> > operands.  These are harder to detect in the same peephole way, so the
>> > patch tries to take a more global approach.
>> >
>> > The idea is to get information about the minimum operation width
>> > in two ways:
>> >
>> > (1) by using the range information attached to the SSA_NAMEs
>> >     (effectively a forward walk, since the range info is
>> >     context-independent).
>> >
>> > (2) by back-propagating the number of output bits required by
>> >     users of the result.
>> >
>> > As explained in the comments, there's a balance to be struck between
>> > narrowing an individual operation and fitting in with the surrounding
>> > code.  The approach is pretty conservative: if we could narrow an
>> > operation to N bits without changing its semantics, it's OK to do that if:
>> >
>> > - no operations later in the chain require more than N bits; or
>> >
>> > - all internally-defined inputs are extended from N bits or fewer,
>> >   and at least one of them is single-use.
>> >
>> > See the comments for the rationale.
>> >
>> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> > since the code seemed more readable without.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Here's a version rebased on top of current trunk.  Changes from last time:
>>
>> - reintroduce dump_generic_expr_loc, with the obvious change to the
>>   prototype
>>
>> - fix a typo in a comment
>>
>> - use vect_element_precision from the new version of 12/n.
>>
>> Tested as before.  OK to install?
>>
>
> Hi Richard,
>
> This patch introduces regressions on arm-none-linux-gnueabihf:
>     gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
>     gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 8
>     gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
>     gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
>     gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
>     gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
>     gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
>     gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2

Sorry about that, was caused by a stupid typo.  I've applied the
below as obvious.

(For the record, it was actually 12/n that caused this.  14/n hasn't
been applied yet.)

Thanks,
Richard


2018-07-02  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo
	in dump string.

Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2018-07-02 14:30:57.000000000 +0100
+++ gcc/tree-vect-patterns.c	2018-07-02 14:30:57.383750450 +0100
@@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp
 {
   return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR,
 				      WIDEN_LSHIFT_EXPR, true,
-				      "vect_widen_shift_pattern");
+				      "vect_recog_widen_shift_pattern");
 }
 
 /* Detect a rotate pattern wouldn't be otherwise vectorized:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-07-02 13:37     ` Richard Sandiford
@ 2018-07-02 13:52       ` Christophe Lyon
  0 siblings, 0 replies; 10+ messages in thread
From: Christophe Lyon @ 2018-07-02 13:52 UTC (permalink / raw)
  To: gcc Patches, Richard Sandiford

On Mon, 2 Jul 2018 at 15:37, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Christophe Lyon <christophe.lyon@linaro.org> writes:
> > On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> > This patch is the main part of PR85694.  The aim is to recognise at least:
> >> >
> >> >   signed char *a, *b, *c;
> >> >   ...
> >> >   for (int i = 0; i < 2048; i++)
> >> >     c[i] = (a[i] + b[i]) >> 1;
> >> >
> >> > as an over-widening pattern, since the addition and shift can be done
> >> > on shorts rather than ints.  However, it ended up being a lot more
> >> > general than that.
> >> >
> >> > The current over-widening pattern detection is limited to a few simple
> >> > cases: logical ops with immediate second operands, and shifts by a
> >> > constant.  These cases are enough for common pixel-format conversion
> >> > and can be detected in a peephole way.
> >> >
> >> > The loop above requires two generalisations of the current code: support
> >> > for addition as well as logical ops, and support for non-constant second
> >> > operands.  These are harder to detect in the same peephole way, so the
> >> > patch tries to take a more global approach.
> >> >
> >> > The idea is to get information about the minimum operation width
> >> > in two ways:
> >> >
> >> > (1) by using the range information attached to the SSA_NAMEs
> >> >     (effectively a forward walk, since the range info is
> >> >     context-independent).
> >> >
> >> > (2) by back-propagating the number of output bits required by
> >> >     users of the result.
> >> >
> >> > As explained in the comments, there's a balance to be struck between
> >> > narrowing an individual operation and fitting in with the surrounding
> >> > code.  The approach is pretty conservative: if we could narrow an
> >> > operation to N bits without changing its semantics, it's OK to do that if:
> >> >
> >> > - no operations later in the chain require more than N bits; or
> >> >
> >> > - all internally-defined inputs are extended from N bits or fewer,
> >> >   and at least one of them is single-use.
> >> >
> >> > See the comments for the rationale.
> >> >
> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> >> > since the code seemed more readable without.
> >> >
> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>
> >> Here's a version rebased on top of current trunk.  Changes from last time:
> >>
> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
> >>   prototype
> >>
> >> - fix a typo in a comment
> >>
> >> - use vect_element_precision from the new version of 12/n.
> >>
> >> Tested as before.  OK to install?
> >>
> >
> > Hi Richard,
> >
> > This patch introduces regressions on arm-none-linux-gnueabihf:
> >     gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> > vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> > vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
> >     gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 8
> >     gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> >     gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 1
> >     gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> >     gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 1
> >     gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> >     gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
>
> Sorry about that, was caused by a stupid typo.  I've applied the
> below as obvious.
>
> (For the record, it was actually 12/n that caused this.  14/n hasn't
> been applied yet.)
>
Sorry about the confusion, I probably messed up in gmail when
searching for the mail containing the patch that caused the
regression.

> Thanks,
> Richard
>
>
> 2018-07-02  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo
>         in dump string.
>
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2018-07-02 14:30:57.000000000 +0100
> +++ gcc/tree-vect-patterns.c    2018-07-02 14:30:57.383750450 +0100
> @@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp
>  {
>    return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR,
>                                       WIDEN_LSHIFT_EXPR, true,
> -                                     "vect_widen_shift_pattern");
> +                                     "vect_recog_widen_shift_pattern");
>  }
>
>  /* Detect a rotate pattern wouldn't be otherwise vectorized:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-07-02 13:12   ` Richard Biener
@ 2018-07-03 10:02     ` Richard Sandiford
  2018-07-03 20:08       ` Christophe Lyon
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2018-07-03 10:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > This patch is the main part of PR85694.  The aim is to recognise at least:
>> >
>> >   signed char *a, *b, *c;
>> >   ...
>> >   for (int i = 0; i < 2048; i++)
>> >     c[i] = (a[i] + b[i]) >> 1;
>> >
>> > as an over-widening pattern, since the addition and shift can be done
>> > on shorts rather than ints.  However, it ended up being a lot more
>> > general than that.
>> >
>> > The current over-widening pattern detection is limited to a few simple
>> > cases: logical ops with immediate second operands, and shifts by a
>> > constant.  These cases are enough for common pixel-format conversion
>> > and can be detected in a peephole way.
>> >
>> > The loop above requires two generalisations of the current code: support
>> > for addition as well as logical ops, and support for non-constant second
>> > operands.  These are harder to detect in the same peephole way, so the
>> > patch tries to take a more global approach.
>> >
>> > The idea is to get information about the minimum operation width
>> > in two ways:
>> >
>> > (1) by using the range information attached to the SSA_NAMEs
>> >     (effectively a forward walk, since the range info is
>> >     context-independent).
>> >
>> > (2) by back-propagating the number of output bits required by
>> >     users of the result.
>> >
>> > As explained in the comments, there's a balance to be struck between
>> > narrowing an individual operation and fitting in with the surrounding
>> > code.  The approach is pretty conservative: if we could narrow an
>> > operation to N bits without changing its semantics, it's OK to do that if:
>> >
>> > - no operations later in the chain require more than N bits; or
>> >
>> > - all internally-defined inputs are extended from N bits or fewer,
>> >   and at least one of them is single-use.
>> >
>> > See the comments for the rationale.
>> >
>> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> > since the code seemed more readable without.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Here's a version rebased on top of current trunk.  Changes from last time:
>>
>> - reintroduce dump_generic_expr_loc, with the obvious change to the
>>   prototype
>>
>> - fix a typo in a comment
>>
>> - use vect_element_precision from the new version of 12/n.
>>
>> Tested as before.  OK to install?
>
> OK.

Thanks.  For the record, here's what I installed (updated on top of
Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).

Richard


2018-07-03  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* poly-int.h (print_hex): New function.
	* dumpfile.h (dump_dec, dump_hex): Declare.
	* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
	* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
	min_input_precision, operation_precision and operation_sign.
	* tree-vect-patterns.c (vect_get_range_info): New function.
	(vect_same_loop_or_bb_p, vect_single_imm_use)
	(vect_operation_fits_smaller_type): Delete.
	(vect_look_through_possible_promotion): Add an optional
	single_use_p parameter.
	(vect_recog_over_widening_pattern): Rewrite to use new
	stmt_vec_info infomration.  Handle one operation at a time.
	(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
	(vect_truncatable_operation_p, vect_set_operation_type)
	(vect_set_min_input_precision): New functions.
	(vect_determine_min_output_precision_1): Likewise.
	(vect_determine_min_output_precision): Likewise.
	(vect_determine_precisions_from_range): Likewise.
	(vect_determine_precisions_from_users): Likewise.
	(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
	(vect_vect_recog_func_ptrs): Put over_widening first.
	Add cast_forwprop.
	(vect_pattern_recog): Call vect_determine_precisions.

gcc/testsuite/
	* gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
	widen_mult pattern.
	* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
	over-widening messages.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-5.c: Likewise.
	* gcc.dg/vect/vect-over-widen-6.c: Likewise.
	* gcc.dg/vect/vect-over-widen-7.c: Likewise.
	* gcc.dg/vect/vect-over-widen-8.c: Likewise.
	* gcc.dg/vect/vect-over-widen-9.c: Likewise.
	* gcc.dg/vect/vect-over-widen-10.c: Likewise.
	* gcc.dg/vect/vect-over-widen-11.c: Likewise.
	* gcc.dg/vect/vect-over-widen-12.c: Likewise.
	* gcc.dg/vect/vect-over-widen-13.c: Likewise.
	* gcc.dg/vect/vect-over-widen-14.c: Likewise.
	* gcc.dg/vect/vect-over-widen-15.c: Likewise.
	* gcc.dg/vect/vect-over-widen-16.c: Likewise.
	* gcc.dg/vect/vect-over-widen-17.c: Likewise.
	* gcc.dg/vect/vect-over-widen-18.c: Likewise.
	* gcc.dg/vect/vect-over-widen-19.c: Likewise.
	* gcc.dg/vect/vect-over-widen-20.c: Likewise.
	* gcc.dg/vect/vect-over-widen-21.c: Likewise.
------------------------------------------------------------------------------

Index: gcc/poly-int.h
===================================================================
--- gcc/poly-int.h	2018-07-03 09:01:31.075962445 +0100
+++ gcc/poly-int.h	2018-07-03 09:02:36.563413564 +0100
@@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &val
 	     poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
 }
 
+/* Use print_hex to print VALUE to FILE.  */
+
+template<unsigned int N, typename C>
+void
+print_hex (const poly_int_pod<N, C> &value, FILE *file)
+{
+  if (value.is_constant ())
+    print_hex (value.coeffs[0], file);
+  else
+    {
+      fprintf (file, "[");
+      for (unsigned int i = 0; i < N; ++i)
+	{
+	  print_hex (value.coeffs[i], file);
+	  fputc (i == N - 1 ? ']' : ',', file);
+	}
+    }
+}
+
 /* Helper for calculating the distance between two points P1 and P2,
    in cases where known_le (P1, P2).  T1 and T2 are the types of the
    two positions, in either order.  The coefficients of P2 - P1 have
Index: gcc/dumpfile.h
===================================================================
--- gcc/dumpfile.h	2018-07-02 14:30:09.280175397 +0100
+++ gcc/dumpfile.h	2018-07-03 09:02:36.563413564 +0100
@@ -436,6 +436,8 @@ extern bool enable_rtl_dump_file (void);
 
 template<unsigned int N, typename C>
 void dump_dec (dump_flags_t, const poly_int<N, C> &);
+extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
+extern void dump_hex (dump_flags_t, const poly_wide_int &);
 
 /* In tree-dump.c  */
 extern void dump_node (const_tree, dump_flags_t, FILE *);
Index: gcc/dumpfile.c
===================================================================
--- gcc/dumpfile.c	2018-07-03 09:01:31.071962478 +0100
+++ gcc/dumpfile.c	2018-07-03 09:02:36.563413564 +0100
@@ -597,6 +597,28 @@ template void dump_dec (dump_flags_t, co
 template void dump_dec (dump_flags_t, const poly_offset_int &);
 template void dump_dec (dump_flags_t, const poly_widest_int &);
 
+void
+dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
+{
+  if (dump_file && (dump_kind & pflags))
+    print_dec (value, dump_file, sgn);
+
+  if (alt_dump_file && (dump_kind & alt_flags))
+    print_dec (value, alt_dump_file, sgn);
+}
+
+/* Output VALUE in hexadecimal to appropriate dump streams.  */
+
+void
+dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
+{
+  if (dump_file && (dump_kind & pflags))
+    print_hex (value, dump_file);
+
+  if (alt_dump_file && (dump_kind & alt_flags))
+    print_hex (value, alt_dump_file);
+}
+
 /* The current dump scope-nesting depth.  */
 
 static int dump_scope_depth;
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2018-07-03 09:01:31.079962411 +0100
+++ gcc/tree-vectorizer.h	2018-07-03 09:02:36.567413531 +0100
@@ -899,6 +899,21 @@ typedef struct _stmt_vec_info {
 
   /* The number of scalar stmt references from active SLP instances.  */
   unsigned int num_slp_uses;
+
+  /* If nonzero, the lhs of the statement could be truncated to this
+     many bits without affecting any users of the result.  */
+  unsigned int min_output_precision;
+
+  /* If nonzero, all non-boolean input operands have the same precision,
+     and they could each be truncated to this many bits without changing
+     the result.  */
+  unsigned int min_input_precision;
+
+  /* If OPERATION_BITS is nonzero, the statement could be performed on
+     an integer with the sign and number of bits given by OPERATION_SIGN
+     and OPERATION_BITS without changing the result.  */
+  unsigned int operation_precision;
+  signop operation_sign;
 } *stmt_vec_info;
 
 /* Information about a gather/scatter call.  */
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2018-07-03 09:01:31.035962780 +0100
+++ gcc/tree-vect-patterns.c	2018-07-03 09:02:36.567413531 +0100
@@ -47,6 +47,40 @@ Software Foundation; either version 3, o
 #include "omp-simd-clone.h"
 #include "predict.h"
 
+/* Return true if we have a useful VR_RANGE range for VAR, storing it
+   in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
+
+static bool
+vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
+{
+  value_range_type vr_type = get_range_info (var, min_value, max_value);
+  wide_int nonzero = get_nonzero_bits (var);
+  signop sgn = TYPE_SIGN (TREE_TYPE (var));
+  if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
+					 nonzero, sgn) == VR_RANGE)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+	  dump_printf (MSG_NOTE, " has range [");
+	  dump_hex (MSG_NOTE, *min_value);
+	  dump_printf (MSG_NOTE, ", ");
+	  dump_hex (MSG_NOTE, *max_value);
+	  dump_printf (MSG_NOTE, "]\n");
+	}
+      return true;
+    }
+  else
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+	  dump_printf (MSG_NOTE, " has no range info\n");
+	}
+      return false;
+    }
+}
+
 /* Report that we've found an instance of pattern PATTERN in
    statement STMT.  */
 
@@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree ot
   return true;
 }
 
-/* Check whether STMT2 is in the same loop or basic block as STMT1.
-   Which of the two applies depends on whether we're currently doing
-   loop-based or basic-block-based vectorization, as determined by
-   the vinfo_for_stmt for STMT1 (which must be defined).
-
-   If this returns true, vinfo_for_stmt for STMT2 is guaranteed
-   to be defined as well.  */
-
-static bool
-vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
-{
-  stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
-  return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
-}
-
-/* If the LHS of DEF_STMT has a single use, and that statement is
-   in the same loop or basic block, return it.  */
-
-static gimple *
-vect_single_imm_use (gimple *def_stmt)
-{
-  tree lhs = gimple_assign_lhs (def_stmt);
-  use_operand_p use_p;
-  gimple *use_stmt;
-
-  if (!single_imm_use (lhs, &use_p, &use_stmt))
-    return NULL;
-
-  if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
-    return NULL;
-
-  return use_stmt;
-}
-
 /* Round bit precision PRECISION up to a full element.  */
 
 static unsigned int
@@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_i
    is possible to convert OP' back to OP using a possible sign change
    followed by a possible promotion P.  Return this OP', or null if OP is
    not a vectorizable SSA name.  If there is a promotion P, describe its
-   input in UNPROM, otherwise describe OP' in UNPROM.
+   input in UNPROM, otherwise describe OP' in UNPROM.  If SINGLE_USE_P
+   is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
+   have more than one user.
 
    A successful return means that it is possible to go from OP' to OP
    via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
@@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_i
 
 static tree
 vect_look_through_possible_promotion (vec_info *vinfo, tree op,
-				      vect_unpromoted_value *unprom)
+				      vect_unpromoted_value *unprom,
+				      bool *single_use_p = NULL)
 {
   tree res = NULL_TREE;
   tree op_type = TREE_TYPE (op);
@@ -420,7 +423,14 @@ vect_look_through_possible_promotion (ve
       if (!def_stmt)
 	break;
       if (dt == vect_internal_def)
-	caster = vinfo_for_stmt (def_stmt);
+	{
+	  caster = vinfo_for_stmt (def_stmt);
+	  /* Ignore pattern statements, since we don't link uses for them.  */
+	  if (single_use_p
+	      && !STMT_VINFO_RELATED_STMT (caster)
+	      && !has_single_use (res))
+	    *single_use_p = false;
+	}
       else
 	caster = NULL;
       gassign *assign = dyn_cast <gassign *> (def_stmt);
@@ -1371,363 +1381,318 @@ vect_recog_widen_sum_pattern (vec<gimple
   return pattern_stmt;
 }
 
+/* Recognize cases in which an operation is performed in one type WTYPE
+   but could be done more efficiently in a narrower type NTYPE.  For example,
+   if we have:
+
+     ATYPE a;  // narrower than NTYPE
+     BTYPE b;  // narrower than NTYPE
+     WTYPE aw = (WTYPE) a;
+     WTYPE bw = (WTYPE) b;
+     WTYPE res = aw + bw;  // only uses of aw and bw
+
+   then it would be more efficient to do:
+
+     NTYPE an = (NTYPE) a;
+     NTYPE bn = (NTYPE) b;
+     NTYPE resn = an + bn;
+     WTYPE res = (WTYPE) resn;
+
+   Other situations include things like:
+
+     ATYPE a;  // NTYPE or narrower
+     WTYPE aw = (WTYPE) a;
+     WTYPE res = aw + b;
+
+   when only "(NTYPE) res" is significant.  In that case it's more efficient
+   to truncate "b" and do the operation on NTYPE instead:
+
+     NTYPE an = (NTYPE) a;
+     NTYPE bn = (NTYPE) b;  // truncation
+     NTYPE resn = an + bn;
+     WTYPE res = (WTYPE) resn;
+
+   All users of "res" should then use "resn" instead, making the final
+   statement dead (not marked as relevant).  The final statement is still
+   needed to maintain the type correctness of the IR.
+
+   vect_determine_precisions has already determined the minimum
+   precison of the operation and the minimum precision required
+   by users of the result.  */
 
-/* Return TRUE if the operation in STMT can be performed on a smaller type.
+static gimple *
+vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
+{
+  gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
+  if (!last_stmt)
+    return NULL;
 
-   Input:
-   STMT - a statement to check.
-   DEF - we support operations with two operands, one of which is constant.
-         The other operand can be defined by a demotion operation, or by a
-         previous statement in a sequence of over-promoted operations.  In the
-         later case DEF is used to replace that operand.  (It is defined by a
-         pattern statement we created for the previous statement in the
-         sequence).
-
-   Input/output:
-   NEW_TYPE - Output: a smaller type that we are trying to use.  Input: if not
-         NULL, it's the type of DEF.
-   STMTS - additional pattern statements.  If a pattern statement (type
-         conversion) is created in this function, its original statement is
-         added to STMTS.
+  /* See whether we have found that this operation can be done on a
+     narrower type without changing its semantics.  */
+  stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
+  unsigned int new_precision = last_stmt_info->operation_precision;
+  if (!new_precision)
+    return NULL;
 
-   Output:
-   OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
-         operands to use in the new pattern statement for STMT (will be created
-         in vect_recog_over_widening_pattern ()).
-   NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
-         statements for STMT: the first one is a type promotion and the second
-         one is the operation itself.  We return the type promotion statement
-	 in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
-         the second pattern statement.  */
+  vec_info *vinfo = last_stmt_info->vinfo;
+  tree lhs = gimple_assign_lhs (last_stmt);
+  tree type = TREE_TYPE (lhs);
+  tree_code code = gimple_assign_rhs_code (last_stmt);
+
+  /* Keep the first operand of a COND_EXPR as-is: only the other two
+     operands are interesting.  */
+  unsigned int first_op = (code == COND_EXPR ? 2 : 1);
 
-static bool
-vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
-				  tree *op0, tree *op1, gimple **new_def_stmt,
-				  vec<gimple *> *stmts)
-{
-  enum tree_code code;
-  tree const_oprnd, oprnd;
-  tree interm_type = NULL_TREE, half_type, new_oprnd, type;
-  gimple *def_stmt, *new_stmt;
-  bool first = false;
-  bool promotion;
+  /* Check the operands.  */
+  unsigned int nops = gimple_num_ops (last_stmt) - first_op;
+  auto_vec <vect_unpromoted_value, 3> unprom (nops);
+  unprom.quick_grow (nops);
+  unsigned int min_precision = 0;
+  bool single_use_p = false;
+  for (unsigned int i = 0; i < nops; ++i)
+    {
+      tree op = gimple_op (last_stmt, first_op + i);
+      if (TREE_CODE (op) == INTEGER_CST)
+	unprom[i].set_op (op, vect_constant_def);
+      else if (TREE_CODE (op) == SSA_NAME)
+	{
+	  bool op_single_use_p = true;
+	  if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
+						     &op_single_use_p))
+	    return NULL;
+	  /* If:
 
-  *op0 = NULL_TREE;
-  *op1 = NULL_TREE;
-  *new_def_stmt = NULL;
+	     (1) N bits of the result are needed;
+	     (2) all inputs are widened from M<N bits; and
+	     (3) one operand OP is a single-use SSA name
+
+	     we can shift the M->N widening from OP to the output
+	     without changing the number or type of extensions involved.
+	     This then reduces the number of copies of STMT_INFO.
+
+	     If instead of (3) more than one operand is a single-use SSA name,
+	     shifting the extension to the output is even more of a win.
+
+	     If instead:
+
+	     (1) N bits of the result are needed;
+	     (2) one operand OP2 is widened from M2<N bits;
+	     (3) another operand OP1 is widened from M1<M2 bits; and
+	     (4) both OP1 and OP2 are single-use
+
+	     the choice is between:
+
+	     (a) truncating OP2 to M1, doing the operation on M1,
+		 and then widening the result to N
+
+	     (b) widening OP1 to M2, doing the operation on M2, and then
+		 widening the result to N
+
+	     Both shift the M2->N widening of the inputs to the output.
+	     (a) additionally shifts the M1->M2 widening to the output;
+	     it requires fewer copies of STMT_INFO but requires an extra
+	     M2->M1 truncation.
+
+	     Which is better will depend on the complexity and cost of
+	     STMT_INFO, which is hard to predict at this stage.  However,
+	     a clear tie-breaker in favor of (b) is the fact that the
+	     truncation in (a) increases the length of the operation chain.
+
+	     If instead of (4) only one of OP1 or OP2 is single-use,
+	     (b) is still a win over doing the operation in N bits:
+	     it still shifts the M2->N widening on the single-use operand
+	     to the output and reduces the number of STMT_INFO copies.
+
+	     If neither operand is single-use then operating on fewer than
+	     N bits might lead to more extensions overall.  Whether it does
+	     or not depends on global information about the vectorization
+	     region, and whether that's a good trade-off would again
+	     depend on the complexity and cost of the statements involved,
+	     as well as things like register pressure that are not normally
+	     modelled at this stage.  We therefore ignore these cases
+	     and just optimize the clear single-use wins above.
+
+	     Thus we take the maximum precision of the unpromoted operands
+	     and record whether any operand is single-use.  */
+	  if (unprom[i].dt == vect_internal_def)
+	    {
+	      min_precision = MAX (min_precision,
+				   TYPE_PRECISION (unprom[i].type));
+	      single_use_p |= op_single_use_p;
+	    }
+	}
+    }
 
-  if (!is_gimple_assign (stmt))
-    return false;
+  /* Although the operation could be done in operation_precision, we have
+     to balance that against introducing extra truncations or extensions.
+     Calculate the minimum precision that can be handled efficiently.
+
+     The loop above determined that the operation could be handled
+     efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
+     extension from the inputs to the output without introducing more
+     instructions, and would reduce the number of instructions required
+     for STMT_INFO itself.
+
+     vect_determine_precisions has also determined that the result only
+     needs min_output_precision bits.  Truncating by a factor of N times
+     requires a tree of N - 1 instructions, so if TYPE is N times wider
+     than min_output_precision, doing the operation in TYPE and truncating
+     the result requires N + (N - 1) = 2N - 1 instructions per output vector.
+     In contrast:
+
+     - truncating the input to a unary operation and doing the operation
+       in the new type requires at most N - 1 + 1 = N instructions per
+       output vector
+
+     - doing the same for a binary operation requires at most
+       (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
+
+     Both unary and binary operations require fewer instructions than
+     this if the operands were extended from a suitable truncated form.
+     Thus there is usually nothing to lose by doing operations in
+     min_output_precision bits, but there can be something to gain.  */
+  if (!single_use_p)
+    min_precision = last_stmt_info->min_output_precision;
+  else
+    min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
 
-  code = gimple_assign_rhs_code (stmt);
-  if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
-      && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
-    return false;
+  /* Apply the minimum efficient precision we just calculated.  */
+  if (new_precision < min_precision)
+    new_precision = min_precision;
+  if (new_precision >= TYPE_PRECISION (type))
+    return NULL;
 
-  oprnd = gimple_assign_rhs1 (stmt);
-  const_oprnd = gimple_assign_rhs2 (stmt);
-  type = gimple_expr_type (stmt);
+  vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
 
-  if (TREE_CODE (oprnd) != SSA_NAME
-      || TREE_CODE (const_oprnd) != INTEGER_CST)
-    return false;
+  *type_out = get_vectype_for_scalar_type (type);
+  if (!*type_out)
+    return NULL;
 
-  /* If oprnd has other uses besides that in stmt we cannot mark it
-     as being part of a pattern only.  */
-  if (!has_single_use (oprnd))
-    return false;
+  /* We've found a viable pattern.  Get the new type of the operation.  */
+  bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
+  tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
+
+  /* We specifically don't check here whether the target supports the
+     new operation, since it might be something that a later pattern
+     wants to rewrite anyway.  If targets have a minimum element size
+     for some optabs, we should pattern-match smaller ops to larger ops
+     where beneficial.  */
+  tree new_vectype = get_vectype_for_scalar_type (new_type);
+  if (!new_vectype)
+    return NULL;
 
-  /* If we are in the middle of a sequence, we use DEF from a previous
-     statement.  Otherwise, OPRND has to be a result of type promotion.  */
-  if (*new_type)
-    {
-      half_type = *new_type;
-      oprnd = def;
-    }
-  else
+  if (dump_enabled_p ())
     {
-      first = true;
-      if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
-			      &promotion)
-	  || !promotion
-	  || !vect_same_loop_or_bb_p (stmt, def_stmt))
-        return false;
+      dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
+      dump_printf (MSG_NOTE, " to ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
+      dump_printf (MSG_NOTE, "\n");
     }
 
-  /* Can we perform the operation on a smaller type?  */
-  switch (code)
-    {
-      case BIT_IOR_EXPR:
-      case BIT_XOR_EXPR:
-      case BIT_AND_EXPR:
-        if (!int_fits_type_p (const_oprnd, half_type))
-          {
-            /* HALF_TYPE is not enough.  Try a bigger type if possible.  */
-            if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
-              return false;
-
-            interm_type = build_nonstandard_integer_type (
-                        TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
-            if (!int_fits_type_p (const_oprnd, interm_type))
-              return false;
-          }
-
-        break;
-
-      case LSHIFT_EXPR:
-        /* Try intermediate type - HALF_TYPE is not enough for sure.  */
-        if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
-          return false;
-
-        /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
-          (e.g., if the original value was char, the shift amount is at most 8
-           if we want to use short).  */
-        if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
-          return false;
-
-        interm_type = build_nonstandard_integer_type (
-                        TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
-
-        if (!vect_supportable_shift (code, interm_type))
-          return false;
-
-        break;
-
-      case RSHIFT_EXPR:
-        if (vect_supportable_shift (code, half_type))
-          break;
-
-        /* Try intermediate type - HALF_TYPE is not supported.  */
-        if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
-          return false;
-
-        interm_type = build_nonstandard_integer_type (
-                        TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
-
-        if (!vect_supportable_shift (code, interm_type))
-          return false;
-
-        break;
-
-      default:
-        gcc_unreachable ();
-    }
-
-  /* There are four possible cases:
-     1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
-        the first statement in the sequence)
-        a. The original, HALF_TYPE, is not enough - we replace the promotion
-           from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
-        b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
-           promotion.
-     2. OPRND is defined by a pattern statement we created.
-        a. Its type is not sufficient for the operation, we create a new stmt:
-           a type conversion for OPRND from HALF_TYPE to INTERM_TYPE.  We store
-           this statement in NEW_DEF_STMT, and it is later put in
-	   STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
-        b. OPRND is good to use in the new statement.  */
-  if (first)
-    {
-      if (interm_type)
-        {
-          /* Replace the original type conversion HALF_TYPE->TYPE with
-             HALF_TYPE->INTERM_TYPE.  */
-          if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
-            {
-              new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
-              /* Check if the already created pattern stmt is what we need.  */
-              if (!is_gimple_assign (new_stmt)
-                  || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
-                  || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
-                return false;
-
-	      stmts->safe_push (def_stmt);
-              oprnd = gimple_assign_lhs (new_stmt);
-            }
-          else
-            {
-              /* Create NEW_OPRND = (INTERM_TYPE) OPRND.  */
-              oprnd = gimple_assign_rhs1 (def_stmt);
-	      new_oprnd = make_ssa_name (interm_type);
-	      new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
-              STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
-              stmts->safe_push (def_stmt);
-              oprnd = new_oprnd;
-            }
-        }
-      else
-        {
-          /* Retrieve the operand before the type promotion.  */
-          oprnd = gimple_assign_rhs1 (def_stmt);
-        }
-    }
-  else
-    {
-      if (interm_type)
-        {
-          /* Create a type conversion HALF_TYPE->INTERM_TYPE.  */
-	  new_oprnd = make_ssa_name (interm_type);
-	  new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
-          oprnd = new_oprnd;
-          *new_def_stmt = new_stmt;
-        }
+  /* Calculate the rhs operands for an operation on NEW_TYPE.  */
+  STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
+  tree ops[3] = {};
+  for (unsigned int i = 1; i < first_op; ++i)
+    ops[i - 1] = gimple_op (last_stmt, i);
+  vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
+		       new_type, &unprom[0], new_vectype);
+
+  /* Use the operation to produce a result of type NEW_TYPE.  */
+  tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
+  gimple *pattern_stmt = gimple_build_assign (new_var, code,
+					      ops[0], ops[1], ops[2]);
+  gimple_set_location (pattern_stmt, gimple_location (last_stmt));
 
-      /* Otherwise, OPRND is already set.  */
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "created pattern stmt: ");
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
     }
 
-  if (interm_type)
-    *new_type = interm_type;
-  else
-    *new_type = half_type;
+  pattern_stmt = vect_convert_output (last_stmt_info, type,
+				      pattern_stmt, new_vectype);
 
-  *op0 = oprnd;
-  *op1 = fold_convert (*new_type, const_oprnd);
-
-  return true;
+  stmts->safe_push (last_stmt);
+  return pattern_stmt;
 }
 
+/* Recognize cases in which the input to a cast is wider than its
+   output, and the input is fed by a widening operation.  Fold this
+   by removing the unnecessary intermediate widening.  E.g.:
 
-/* Try to find a statement or a sequence of statements that can be performed
-   on a smaller type:
+     unsigned char a;
+     unsigned int b = (unsigned int) a;
+     unsigned short c = (unsigned short) b;
 
-     type x_t;
-     TYPE x_T, res0_T, res1_T;
-   loop:
-     S1  x_t = *p;
-     S2  x_T = (TYPE) x_t;
-     S3  res0_T = op (x_T, C0);
-     S4  res1_T = op (res0_T, C1);
-     S5  ... = () res1_T;  - type demotion
-
-   where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
-   constants.
-   Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
-   be 'type' or some intermediate type.  For now, we expect S5 to be a type
-   demotion operation.  We also check that S3 and S4 have only one use.  */
+   -->
 
-static gimple *
-vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
-{
-  gimple *stmt = stmts->pop ();
-  gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
-	 *use_stmt = NULL;
-  tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
-  tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
-  bool first;
-  tree type = NULL;
-
-  first = true;
-  while (1)
-    {
-      if (!vinfo_for_stmt (stmt)
-          || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
-        return NULL;
-
-      new_def_stmt = NULL;
-      if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
-                                             &op0, &op1, &new_def_stmt,
-                                             stmts))
-        {
-          if (first)
-            return NULL;
-          else
-            break;
-        }
+     unsigned short c = (unsigned short) a;
 
-      /* STMT can be performed on a smaller type.  Check its uses.  */
-      use_stmt = vect_single_imm_use (stmt);
-      if (!use_stmt || !is_gimple_assign (use_stmt))
-        return NULL;
-
-      /* Create pattern statement for STMT.  */
-      vectype = get_vectype_for_scalar_type (new_type);
-      if (!vectype)
-        return NULL;
-
-      /* We want to collect all the statements for which we create pattern
-         statetments, except for the case when the last statement in the
-         sequence doesn't have a corresponding pattern statement.  In such
-         case we associate the last pattern statement with the last statement
-         in the sequence.  Therefore, we only add the original statement to
-         the list if we know that it is not the last.  */
-      if (prev_stmt)
-        stmts->safe_push (prev_stmt);
+   Although this is rare in input IR, it is an expected side-effect
+   of the over-widening pattern above.
 
-      var = vect_recog_temp_ssa_var (new_type, NULL);
-      pattern_stmt
-	= gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
-      STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
-      new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
+   This is beneficial also for integer-to-float conversions, if the
+   widened integer has more bits than the float, and if the unwidened
+   input doesn't.  */
 
-      if (dump_enabled_p ())
-        {
-          dump_printf_loc (MSG_NOTE, vect_location,
-                           "created pattern stmt: ");
-          dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
-        }
+static gimple *
+vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
+{
+  /* Check for a cast, including an integer-to-float conversion.  */
+  gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
+  if (!last_stmt)
+    return NULL;
+  tree_code code = gimple_assign_rhs_code (last_stmt);
+  if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
+    return NULL;
 
-      type = gimple_expr_type (stmt);
-      prev_stmt = stmt;
-      stmt = use_stmt;
-
-      first = false;
-    }
-
-  /* We got a sequence.  We expect it to end with a type demotion operation.
-     Otherwise, we quit (for now).  There are three possible cases: the
-     conversion is to NEW_TYPE (we don't do anything), the conversion is to
-     a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
-     NEW_TYPE differs (we create a new conversion statement).  */
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
-    {
-      use_lhs = gimple_assign_lhs (use_stmt);
-      use_type = TREE_TYPE (use_lhs);
-      /* Support only type demotion or signedess change.  */
-      if (!INTEGRAL_TYPE_P (use_type)
-	  || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
-        return NULL;
+  /* Make sure that the rhs is a scalar with a natural bitsize.  */
+  tree lhs = gimple_assign_lhs (last_stmt);
+  if (!lhs)
+    return NULL;
+  tree lhs_type = TREE_TYPE (lhs);
+  scalar_mode lhs_mode;
+  if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
+      || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
+    return NULL;
 
-      /* Check that NEW_TYPE is not bigger than the conversion result.  */
-      if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
-	return NULL;
+  /* Check for a narrowing operation (from a vector point of view).  */
+  tree rhs = gimple_assign_rhs1 (last_stmt);
+  tree rhs_type = TREE_TYPE (rhs);
+  if (!INTEGRAL_TYPE_P (rhs_type)
+      || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
+      || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
+    return NULL;
 
-      if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
-          || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
-        {
-	  *type_out = get_vectype_for_scalar_type (use_type);
-	  if (!*type_out)
-	    return NULL;
+  /* Try to find an unpromoted input.  */
+  stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
+  vec_info *vinfo = last_stmt_info->vinfo;
+  vect_unpromoted_value unprom;
+  if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
+      || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
+    return NULL;
 
-          /* Create NEW_TYPE->USE_TYPE conversion.  */
-	  new_oprnd = make_ssa_name (use_type);
-	  pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
-          STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
-
-          /* We created a pattern statement for the last statement in the
-             sequence, so we don't need to associate it with the pattern
-             statement created for PREV_STMT.  Therefore, we add PREV_STMT
-             to the list in order to mark it later in vect_pattern_recog_1.  */
-          if (prev_stmt)
-            stmts->safe_push (prev_stmt);
-        }
-      else
-        {
-          if (prev_stmt)
-	    STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
-	       = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
+  /* If the bits above RHS_TYPE matter, make sure that they're the
+     same when extending from UNPROM as they are when extending from RHS.  */
+  if (!INTEGRAL_TYPE_P (lhs_type)
+      && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
+    return NULL;
 
-	  *type_out = vectype;
-        }
+  /* We can get the same result by casting UNPROM directly, to avoid
+     the unnecessary widening and narrowing.  */
+  vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
 
-      stmts->safe_push (use_stmt);
-    }
-  else
-    /* TODO: support general case, create a conversion to the correct type.  */
+  *type_out = get_vectype_for_scalar_type (lhs_type);
+  if (!*type_out)
     return NULL;
 
-  /* Pattern detected.  */
-  vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
+  tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
+  gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op);
+  gimple_set_location (pattern_stmt, gimple_location (last_stmt));
 
+  stmts->safe_push (last_stmt);
   return pattern_stmt;
 }
 
@@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<g
   return pattern_stmt;
 }
 
+/* Return true if TYPE is a non-boolean integer type.  These are the types
+   that we want to consider for narrowing.  */
+
+static bool
+vect_narrowable_type_p (tree type)
+{
+  return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
+}
+
+/* Return true if the operation given by CODE can be truncated to N bits
+   when only N bits of the output are needed.  This is only true if bit N+1
+   of the inputs has no effect on the low N bits of the result.  */
+
+static bool
+vect_truncatable_operation_p (tree_code code)
+{
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+    case COND_EXPR:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
+/* Record that STMT_INFO could be changed from operating on TYPE to
+   operating on a type with the precision and sign given by PRECISION
+   and SIGN respectively.  PRECISION is an arbitrary bit precision;
+   it might not be a whole number of bytes.  */
+
+static void
+vect_set_operation_type (stmt_vec_info stmt_info, tree type,
+			 unsigned int precision, signop sign)
+{
+  /* Round the precision up to a whole number of bytes.  */
+  precision = vect_element_precision (precision);
+  if (precision < TYPE_PRECISION (type)
+      && (!stmt_info->operation_precision
+	  || stmt_info->operation_precision > precision))
+    {
+      stmt_info->operation_precision = precision;
+      stmt_info->operation_sign = sign;
+    }
+}
+
+/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
+   non-boolean inputs, all of which have type TYPE.  MIN_INPUT_PRECISION
+   is an arbitrary bit precision; it might not be a whole number of bytes.  */
+
+static void
+vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
+			      unsigned int min_input_precision)
+{
+  /* This operation in isolation only requires the inputs to have
+     MIN_INPUT_PRECISION of precision,  However, that doesn't mean
+     that MIN_INPUT_PRECISION is a natural precision for the chain
+     as a whole.  E.g. consider something like:
+
+	 unsigned short *x, *y;
+	 *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+     The right shift can be done on unsigned chars, and only requires the
+     result of "*x & 0xf0" to be done on unsigned chars.  But taking that
+     approach would mean turning a natural chain of single-vector unsigned
+     short operations into one that truncates "*x" and then extends
+     "(*x & 0xf0) >> 4", with two vectors for each unsigned short
+     operation and one vector for each unsigned char operation.
+     This would be a significant pessimization.
+
+     Instead only propagate the maximum of this precision and the precision
+     required by the users of the result.  This means that we don't pessimize
+     the case above but continue to optimize things like:
+
+	 unsigned char *y;
+	 unsigned short *x;
+	 *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+     Here we would truncate two vectors of *x to a single vector of
+     unsigned chars and use single-vector unsigned char operations for
+     everything else, rather than doing two unsigned short copies of
+     "(*x & 0xf0) >> 4" and then truncating the result.  */
+  min_input_precision = MAX (min_input_precision,
+			     stmt_info->min_output_precision);
+
+  if (min_input_precision < TYPE_PRECISION (type)
+      && (!stmt_info->min_input_precision
+	  || stmt_info->min_input_precision > min_input_precision))
+    stmt_info->min_input_precision = min_input_precision;
+}
+
+/* Subroutine of vect_determine_min_output_precision.  Return true if
+   we can calculate a reduced number of output bits for STMT_INFO,
+   whose result is LHS.  */
+
+static bool
+vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
+{
+  /* Take the maximum precision required by users of the result.  */
+  unsigned int precision = 0;
+  imm_use_iterator iter;
+  use_operand_p use;
+  FOR_EACH_IMM_USE_FAST (use, iter, lhs)
+    {
+      gimple *use_stmt = USE_STMT (use);
+      if (is_gimple_debug (use_stmt))
+	continue;
+      if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
+	return false;
+      stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
+      if (!use_stmt_info->min_input_precision)
+	return false;
+      precision = MAX (precision, use_stmt_info->min_input_precision);
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
+		       precision);
+      dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
+      dump_printf (MSG_NOTE, " are significant\n");
+    }
+  stmt_info->min_output_precision = precision;
+  return true;
+}
+
+/* Calculate min_output_precision for STMT_INFO.  */
+
+static void
+vect_determine_min_output_precision (stmt_vec_info stmt_info)
+{
+  /* We're only interested in statements with a narrowable result.  */
+  tree lhs = gimple_get_lhs (stmt_info->stmt);
+  if (!lhs
+      || TREE_CODE (lhs) != SSA_NAME
+      || !vect_narrowable_type_p (TREE_TYPE (lhs)))
+    return;
+
+  if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
+    stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
+}
+
+/* Use range information to decide whether STMT (described by STMT_INFO)
+   could be done in a narrower type.  This is effectively a forward
+   propagation, since it uses context-independent information that applies
+   to all users of an SSA name.  */
+
+static void
+vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+    return;
+
+  tree type = TREE_TYPE (lhs);
+  if (!vect_narrowable_type_p (type))
+    return;
+
+  /* First see whether we have any useful range information for the result.  */
+  unsigned int precision = TYPE_PRECISION (type);
+  signop sign = TYPE_SIGN (type);
+  wide_int min_value, max_value;
+  if (!vect_get_range_info (lhs, &min_value, &max_value))
+    return;
+
+  tree_code code = gimple_assign_rhs_code (stmt);
+  unsigned int nops = gimple_num_ops (stmt);
+
+  if (!vect_truncatable_operation_p (code))
+    /* Check that all relevant input operands are compatible, and update
+       [MIN_VALUE, MAX_VALUE] to include their ranges.  */
+    for (unsigned int i = 1; i < nops; ++i)
+      {
+	tree op = gimple_op (stmt, i);
+	if (TREE_CODE (op) == INTEGER_CST)
+	  {
+	    /* Don't require the integer to have RHS_TYPE (which it might
+	       not for things like shift amounts, etc.), but do require it
+	       to fit the type.  */
+	    if (!int_fits_type_p (op, type))
+	      return;
+
+	    min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
+	    max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
+	  }
+	else if (TREE_CODE (op) == SSA_NAME)
+	  {
+	    /* Ignore codes that don't take uniform arguments.  */
+	    if (!types_compatible_p (TREE_TYPE (op), type))
+	      return;
+
+	    wide_int op_min_value, op_max_value;
+	    if (!vect_get_range_info (op, &op_min_value, &op_max_value))
+	      return;
+
+	    min_value = wi::min (min_value, op_min_value, sign);
+	    max_value = wi::max (max_value, op_max_value, sign);
+	  }
+	else
+	  return;
+      }
+
+  /* Try to switch signed types for unsigned types if we can.
+     This is better for two reasons.  First, unsigned ops tend
+     to be cheaper than signed ops.  Second, it means that we can
+     handle things like:
+
+	signed char c;
+	int res = (int) c & 0xff00; // range [0x0000, 0xff00]
+
+     as:
+
+	signed char c;
+	unsigned short res_1 = (unsigned short) c & 0xff00;
+	int res = (int) res_1;
+
+     where the intermediate result res_1 has unsigned rather than
+     signed type.  */
+  if (sign == SIGNED && !wi::neg_p (min_value))
+    sign = UNSIGNED;
+
+  /* See what precision is required for MIN_VALUE and MAX_VALUE.  */
+  unsigned int precision1 = wi::min_precision (min_value, sign);
+  unsigned int precision2 = wi::min_precision (max_value, sign);
+  unsigned int value_precision = MAX (precision1, precision2);
+  if (value_precision >= precision)
+    return;
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+		       " without loss of precision: ",
+		       sign == SIGNED ? "signed" : "unsigned",
+		       value_precision);
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+    }
+
+  vect_set_operation_type (stmt_info, type, value_precision, sign);
+  vect_set_min_input_precision (stmt_info, type, value_precision);
+}
+
+/* Use information about the users of STMT's result to decide whether
+   STMT (described by STMT_INFO) could be done in a narrower type.
+   This is effectively a backward propagation.  */
+
+static void
+vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
+{
+  tree_code code = gimple_assign_rhs_code (stmt);
+  unsigned int opno = (code == COND_EXPR ? 2 : 1);
+  tree type = TREE_TYPE (gimple_op (stmt, opno));
+  if (!vect_narrowable_type_p (type))
+    return;
+
+  unsigned int precision = TYPE_PRECISION (type);
+  unsigned int operation_precision, min_input_precision;
+  switch (code)
+    {
+    CASE_CONVERT:
+      /* Only the bits that contribute to the output matter.  Don't change
+	 the precision of the operation itself.  */
+      operation_precision = precision;
+      min_input_precision = stmt_info->min_output_precision;
+      break;
+
+    case LSHIFT_EXPR:
+    case RSHIFT_EXPR:
+      {
+	tree shift = gimple_assign_rhs2 (stmt);
+	if (TREE_CODE (shift) != INTEGER_CST
+	    || !wi::ltu_p (wi::to_widest (shift), precision))
+	  return;
+	unsigned int const_shift = TREE_INT_CST_LOW (shift);
+	if (code == LSHIFT_EXPR)
+	  {
+	    /* We need CONST_SHIFT fewer bits of the input.  */
+	    operation_precision = stmt_info->min_output_precision;
+	    min_input_precision = (MAX (operation_precision, const_shift)
+				    - const_shift);
+	  }
+	else
+	  {
+	    /* We need CONST_SHIFT extra bits to do the operation.  */
+	    operation_precision = (stmt_info->min_output_precision
+				   + const_shift);
+	    min_input_precision = operation_precision;
+	  }
+	break;
+      }
+
+    default:
+      if (vect_truncatable_operation_p (code))
+	{
+	  /* Input bit N has no effect on output bits N-1 and lower.  */
+	  operation_precision = stmt_info->min_output_precision;
+	  min_input_precision = operation_precision;
+	  break;
+	}
+      return;
+    }
+
+  if (operation_precision < precision)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+			   " without affecting users: ",
+			   TYPE_UNSIGNED (type) ? "unsigned" : "signed",
+			   operation_precision);
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+	}
+      vect_set_operation_type (stmt_info, type, operation_precision,
+			       TYPE_SIGN (type));
+    }
+  vect_set_min_input_precision (stmt_info, type, min_input_precision);
+}
+
+/* Handle vect_determine_precisions for STMT_INFO, given that we
+   have already done so for the users of its result.  */
+
+void
+vect_determine_stmt_precisions (stmt_vec_info stmt_info)
+{
+  vect_determine_min_output_precision (stmt_info);
+  if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
+    {
+      vect_determine_precisions_from_range (stmt_info, stmt);
+      vect_determine_precisions_from_users (stmt_info, stmt);
+    }
+}
+
+/* Walk backwards through the vectorizable region to determine the
+   values of these fields:
+
+   - min_output_precision
+   - min_input_precision
+   - operation_precision
+   - operation_sign.  */
+
+void
+vect_determine_precisions (vec_info *vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_determine_precisions");
+
+  if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+    {
+      struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+
+      for (unsigned int i = 0; i < nbbs; i++)
+	{
+	  basic_block bb = bbs[nbbs - i - 1];
+	  for (gimple_stmt_iterator si = gsi_last_bb (bb);
+	       !gsi_end_p (si); gsi_prev (&si))
+	    vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
+	}
+    }
+  else
+    {
+      bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+      gimple_stmt_iterator si = bb_vinfo->region_end;
+      gimple *stmt;
+      do
+	{
+	  if (!gsi_stmt (si))
+	    si = gsi_last_bb (bb_vinfo->bb);
+	  else
+	    gsi_prev (&si);
+	  stmt = gsi_stmt (si);
+	  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+	  if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+	    vect_determine_stmt_precisions (stmt_info);
+	}
+      while (stmt != gsi_stmt (bb_vinfo->region_begin));
+    }
+}
+
 typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
 
 struct vect_recog_func
@@ -4217,13 +4566,14 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_over_widening_pattern, "over_widening" },
+  { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
   { vect_recog_widen_mult_pattern, "widen_mult" },
   { vect_recog_dot_prod_pattern, "dot_prod" },
   { vect_recog_sad_pattern, "sad" },
   { vect_recog_widen_sum_pattern, "widen_sum" },
   { vect_recog_pow_pattern, "pow" },
   { vect_recog_widen_shift_pattern, "widen_shift" },
-  { vect_recog_over_widening_pattern, "over_widening" },
   { vect_recog_rotate_pattern, "rotate" },
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
@@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo)
   unsigned int i, j;
   auto_vec<gimple *, 1> stmts_to_replace;
 
+  vect_determine_precisions (vinfo);
+
   DUMP_VECT_SCOPE ("vect_pattern_recog");
 
   if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
Index: gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c	2016-11-11 17:07:36.776796115 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c	2018-07-03 09:02:36.567413531 +0100
@@ -43,5 +43,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
-/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
+/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2018-07-03 09:02:36.563413564 +0100
@@ -62,8 +62,9 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2018-07-03 09:02:36.563413564 +0100
@@ -58,7 +58,9 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	2018-07-03 09:02:36.563413564 +0100
@@ -57,7 +57,12 @@ int main (void)
   return 0;
 }
 
-/* Final value stays in int, so no over-widening is detected at the moment.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
+/* This is an over-widening even though the final result is still an int.
+   It's better to do one vector of ops on chars and then widen than to
+   widen and then do 4 vectors of ops on ints.  */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	2018-07-03 09:02:36.563413564 +0100
@@ -57,7 +57,12 @@ int main (void)
   return 0;
 }
 
-/* Final value stays in int, so no over-widening is detected at the moment.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
+/* This is an over-widening even though the final result is still an int.
+   It's better to do one vector of ops on chars and then widen than to
+   widen and then do 4 vectors of ops on ints.  */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	2018-07-03 09:02:36.563413564 +0100
@@ -57,6 +57,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2018-07-03 09:02:36.563413564 +0100
@@ -59,7 +59,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2018-07-03 09:02:36.563413564 +0100
@@ -66,8 +66,9 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2018-07-03 09:02:36.563413564 +0100
@@ -62,7 +62,9 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,66 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+/* Deliberate use of signed >>.  */
+#define DEF_LOOP(SIGNEDNESS)			\
+  void __attribute__ ((noipa))			\
+  f_##SIGNEDNESS (SIGNEDNESS char *restrict a,	\
+		  SIGNEDNESS char *restrict b,	\
+		  SIGNEDNESS char *restrict c)	\
+  {						\
+    a[0] = (b[0] + c[0]) >> 1;			\
+    a[1] = (b[1] + c[1]) >> 1;			\
+    a[2] = (b[2] + c[2]) >> 1;			\
+    a[3] = (b[3] + c[3]) >> 1;			\
+    a[4] = (b[4] + c[4]) >> 1;			\
+    a[5] = (b[5] + c[5]) >> 1;			\
+    a[6] = (b[6] + c[6]) >> 1;			\
+    a[7] = (b[7] + c[7]) >> 1;			\
+    a[8] = (b[8] + c[8]) >> 1;			\
+    a[9] = (b[9] + c[9]) >> 1;			\
+    a[10] = (b[10] + c[10]) >> 1;		\
+    a[11] = (b[11] + c[11]) >> 1;		\
+    a[12] = (b[12] + c[12]) >> 1;		\
+    a[13] = (b[13] + c[13]) >> 1;		\
+    a[14] = (b[14] + c[14]) >> 1;		\
+    a[15] = (b[15] + c[15]) >> 1;		\
+  }
+
+DEF_LOOP (signed)
+DEF_LOOP (unsigned)
+
+#define N 16
+
+#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C)		\
+  {							\
+    SIGNEDNESS char a[N], b[N], c[N];			\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = BASE_B + i * 15;				\
+	c[i] = BASE_C + i * 14;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##SIGNEDNESS (a, b, c);				\
+    for (int i = 0; i < N; ++i)				\
+      if (a[i] != (BASE_B + BASE_C + i * 29) >> 1)	\
+	__builtin_abort ();				\
+  }
+
+int
+main (void)
+{
+  check_vect ();
+
+  TEST_LOOP (signed, -128, -120);
+  TEST_LOOP (unsigned, 4, 10);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+/* Deliberate use of signed >>.  */
+#define DEF_LOOP(SIGNEDNESS)			\
+  void __attribute__ ((noipa))			\
+  f_##SIGNEDNESS (SIGNEDNESS char *restrict a,	\
+		  SIGNEDNESS char *restrict b,	\
+		  SIGNEDNESS char c)		\
+  {						\
+    a[0] = (b[0] + c) >> 1;			\
+    a[1] = (b[1] + c) >> 1;			\
+    a[2] = (b[2] + c) >> 1;			\
+    a[3] = (b[3] + c) >> 1;			\
+    a[4] = (b[4] + c) >> 1;			\
+    a[5] = (b[5] + c) >> 1;			\
+    a[6] = (b[6] + c) >> 1;			\
+    a[7] = (b[7] + c) >> 1;			\
+    a[8] = (b[8] + c) >> 1;			\
+    a[9] = (b[9] + c) >> 1;			\
+    a[10] = (b[10] + c) >> 1;			\
+    a[11] = (b[11] + c) >> 1;			\
+    a[12] = (b[12] + c) >> 1;			\
+    a[13] = (b[13] + c) >> 1;			\
+    a[14] = (b[14] + c) >> 1;			\
+    a[15] = (b[15] + c) >> 1;			\
+  }
+
+DEF_LOOP (signed)
+DEF_LOOP (unsigned)
+
+#define N 16
+
+#define TEST_LOOP(SIGNEDNESS, BASE_B, C)		\
+  {							\
+    SIGNEDNESS char a[N], b[N], c[N];			\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = BASE_B + i * 15;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##SIGNEDNESS (a, b, C);				\
+    for (int i = 0; i < N; ++i)				\
+      if (a[i] != (BASE_B + C + i * 15) >> 1)		\
+	__builtin_abort ();				\
+  }
+
+int
+main (void)
+{
+  check_vect ();
+
+  TEST_LOOP (signed, -128, -120);
+  TEST_LOOP (unsigned, 4, 250);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+   that these calculations can be done in SIGNEDNESS short.  */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+   SIGNEDNESS char *restrict c)
+{
+  /* Deliberate use of signed >>.  */
+  for (int i = 0; i < N; ++i)
+    a[i] = (b[i] + c[i]) >> 1;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS char a[N], b[N], c[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = BASE_B + i * 5;
+      c[i] = BASE_C + i * 4;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+
+#include "vect-over-widen-5.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#define D -120
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+   that these calculations can be done in SIGNEDNESS short.  */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+   SIGNEDNESS char *restrict c, SIGNEDNESS char d)
+{
+  int promoted_d = d;
+  for (int i = 0; i < N; ++i)
+    /* Deliberate use of signed >>.  */
+    a[i] = (b[i] + c[i] + promoted_d) >> 2;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS char a[N], b[N], c[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = BASE_B + i * 5;
+      c[i] = BASE_C + i * 4;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, D);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c	2018-07-03 09:02:36.567413531 +0100
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#define D 251
+#endif
+
+#include "vect-over-widen-7.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c	2018-07-03 09:02:36.567413531 +0100
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+   that these calculations can be done in SIGNEDNESS short.  */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+   SIGNEDNESS char *restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      /* Deliberate use of signed >>.  */
+      int res = b[i] + c[i];
+      a[i] = (res + (res >> 1)) >> 2;
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS char a[N], b[N], c[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = BASE_B + i * 5;
+      c[i] = BASE_C + i * 4;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c);
+  for (int i = 0; i < N; ++i)
+    {
+      int res = BASE_B + BASE_C + i * 9;
+      if (a[i] != ((res + (res >> 1)) >> 2))
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-9.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+   that these calculations can be done in SIGNEDNESS short, with "res"
+   being extended for the store to d[i].  */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+   SIGNEDNESS char *restrict c, int *restrict d)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      /* Deliberate use of signed >>.  */
+      int res = b[i] + c[i];
+      a[i] = (res + (res >> 1)) >> 2;
+      d[i] = res;
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS char a[N], b[N], c[N];
+  int d[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = BASE_B + i * 5;
+      c[i] = BASE_C + i * 4;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, d);
+  for (int i = 0; i < N; ++i)
+    {
+      int res = BASE_B + BASE_C + i * 9;
+      if (a[i] != ((res + (res >> 1)) >> 2))
+	__builtin_abort ();
+      if (d[i] != res)
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-11.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -120
+#endif
+
+#define N 50
+
+/* We rely on range analysis to show that these calculations can be done
+   in SIGNEDNESS short.  */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+   SIGNEDNESS char *restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    a[i] = (b[i] + c[i]) / 2;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS char a[N], b[N], c[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = BASE_B + i * 5;
+      c[i] = BASE_C + i * 4;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,18 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-13.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -120
+#endif
+
+#define N 50
+
+/* We rely on range analysis to show that these calculations can be done
+   in SIGNEDNESS short, with the result being extended to int for the
+   store.  */
+void __attribute__ ((noipa))
+f (int *restrict a, SIGNEDNESS char *restrict b,
+   SIGNEDNESS char *restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    a[i] = (b[i] + c[i]) / 2;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  int a[N];
+  SIGNEDNESS char b[N], c[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = BASE_B + i * 5;
+      c[i] = BASE_C + i * 4;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,18 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-15.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,46 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 1024
+
+/* This should not be treated as an over-widening pattern, even though
+   "(b[i] & 0xef) | 0x80)" could be done in unsigned chars.  */
+
+void __attribute__ ((noipa))
+f (unsigned short *restrict a, unsigned short *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+      a[i] = foo;
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  unsigned short a[N], b[N];
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = i;
+      b[i] = i * 3;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 1024
+
+/* This should be treated as an over-widening pattern: we can truncate
+   b to unsigned char after loading it and do all the computation in
+   unsigned char.  */
+
+void __attribute__ ((noipa))
+f (unsigned char *restrict a, unsigned short *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+      a[i] = foo;
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  unsigned char a[N];
+  unsigned short b[N];
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = i;
+      b[i] = i * 3;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 111
+
+/* This shouldn't be treated as an over-widening operation: it's better
+   to reuse the extensions of di and ei for di + ei than to add them
+   as shorts and introduce a third extension.  */
+
+void __attribute__ ((noipa))
+f (unsigned int *restrict a, unsigned int *restrict b,
+   unsigned int *restrict c, unsigned char *restrict d,
+   unsigned char *restrict e)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      unsigned int di = d[i];
+      unsigned int ei = e[i];
+      a[i] = di;
+      b[i] = ei;
+      c[i] = di + ei;
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  unsigned int a[N], b[N], c[N];
+  unsigned char d[N], e[N];
+  for (int i = 0; i < N; ++i)
+    {
+      d[i] = i * 2 + 3;
+      e[i] = i + 100;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, d, e);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != i * 2 + 3
+	|| b[i] != i + 100
+	|| c[i] != i * 3 + 103)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 111
+
+/* This shouldn't be treated as an over-widening operation: it's better
+   to reuse the extensions of di and ei for di + ei than to add them
+   as shorts and introduce a third extension.  */
+
+void __attribute__ ((noipa))
+f (unsigned int *restrict a, unsigned int *restrict b,
+   unsigned int *restrict c, unsigned char *restrict d,
+   unsigned char *restrict e)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      int di = d[i];
+      int ei = e[i];
+      a[i] = di;
+      b[i] = ei;
+      c[i] = di + ei;
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  unsigned int a[N], b[N], c[N];
+  unsigned char d[N], e[N];
+  for (int i = 0; i < N; ++i)
+    {
+      d[i] = i * 2 + 3;
+      e[i] = i + 100;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, d, e);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != i * 2 + 3
+	|| b[i] != i + 100
+	|| c[i] != i * 3 + 103)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
===================================================================
--- /dev/null	2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c	2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 111
+
+/* This shouldn't be treated as an over-widening operation: it's better
+   to reuse the extensions of di and ei for di + ei than to add them
+   as shorts and introduce a third extension.  */
+
+void __attribute__ ((noipa))
+f (unsigned int *restrict a, unsigned int *restrict b,
+   unsigned int *restrict c, unsigned char *restrict d,
+   unsigned char *restrict e)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+    {
+      a[i] = d[i];
+      b[i] = e[i];
+      c[i] = d[i] + e[i];
+    }
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  unsigned int a[N], b[N], c[N];
+  unsigned char d[N], e[N];
+  for (int i = 0; i < N; ++i)
+    {
+      d[i] = i * 2 + 3;
+      e[i] = i + 100;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, d, e);
+  for (int i = 0; i < N; ++i)
+    if (a[i] != i * 2 + 3
+	|| b[i] != i + 100
+	|| c[i] != i * 3 + 103)
+      __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-07-03 10:02     ` Richard Sandiford
@ 2018-07-03 20:08       ` Christophe Lyon
  2018-07-03 20:39         ` Rainer Orth
  2018-07-04  7:18         ` Richard Sandiford
  0 siblings, 2 replies; 10+ messages in thread
From: Christophe Lyon @ 2018-07-03 20:08 UTC (permalink / raw)
  To: Richard Biener, gcc Patches, Richard Sandiford

On Tue, 3 Jul 2018 at 12:02, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> > This patch is the main part of PR85694.  The aim is to recognise at least:
> >> >
> >> >   signed char *a, *b, *c;
> >> >   ...
> >> >   for (int i = 0; i < 2048; i++)
> >> >     c[i] = (a[i] + b[i]) >> 1;
> >> >
> >> > as an over-widening pattern, since the addition and shift can be done
> >> > on shorts rather than ints.  However, it ended up being a lot more
> >> > general than that.
> >> >
> >> > The current over-widening pattern detection is limited to a few simple
> >> > cases: logical ops with immediate second operands, and shifts by a
> >> > constant.  These cases are enough for common pixel-format conversion
> >> > and can be detected in a peephole way.
> >> >
> >> > The loop above requires two generalisations of the current code: support
> >> > for addition as well as logical ops, and support for non-constant second
> >> > operands.  These are harder to detect in the same peephole way, so the
> >> > patch tries to take a more global approach.
> >> >
> >> > The idea is to get information about the minimum operation width
> >> > in two ways:
> >> >
> >> > (1) by using the range information attached to the SSA_NAMEs
> >> >     (effectively a forward walk, since the range info is
> >> >     context-independent).
> >> >
> >> > (2) by back-propagating the number of output bits required by
> >> >     users of the result.
> >> >
> >> > As explained in the comments, there's a balance to be struck between
> >> > narrowing an individual operation and fitting in with the surrounding
> >> > code.  The approach is pretty conservative: if we could narrow an
> >> > operation to N bits without changing its semantics, it's OK to do that if:
> >> >
> >> > - no operations later in the chain require more than N bits; or
> >> >
> >> > - all internally-defined inputs are extended from N bits or fewer,
> >> >   and at least one of them is single-use.
> >> >
> >> > See the comments for the rationale.
> >> >
> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> >> > since the code seemed more readable without.
> >> >
> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>
> >> Here's a version rebased on top of current trunk.  Changes from last time:
> >>
> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
> >>   prototype
> >>
> >> - fix a typo in a comment
> >>
> >> - use vect_element_precision from the new version of 12/n.
> >>
> >> Tested as before.  OK to install?
> >
> > OK.
>
> Thanks.  For the record, here's what I installed (updated on top of
> Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).
>
> Richard
>
Hi,

It seems the new bb-slp-over-widen tests lack a -fdump option:
gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
does not exist
UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "basic block vectorized" 2

Christophe

>
> 2018-07-03  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * poly-int.h (print_hex): New function.
>         * dumpfile.h (dump_dec, dump_hex): Declare.
>         * dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
>         * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
>         min_input_precision, operation_precision and operation_sign.
>         * tree-vect-patterns.c (vect_get_range_info): New function.
>         (vect_same_loop_or_bb_p, vect_single_imm_use)
>         (vect_operation_fits_smaller_type): Delete.
>         (vect_look_through_possible_promotion): Add an optional
>         single_use_p parameter.
>         (vect_recog_over_widening_pattern): Rewrite to use new
>         stmt_vec_info infomration.  Handle one operation at a time.
>         (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
>         (vect_truncatable_operation_p, vect_set_operation_type)
>         (vect_set_min_input_precision): New functions.
>         (vect_determine_min_output_precision_1): Likewise.
>         (vect_determine_min_output_precision): Likewise.
>         (vect_determine_precisions_from_range): Likewise.
>         (vect_determine_precisions_from_users): Likewise.
>         (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
>         (vect_vect_recog_func_ptrs): Put over_widening first.
>         Add cast_forwprop.
>         (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
>         widen_mult pattern.
>         * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
>         over-widening messages.
>         * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-2.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-3.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-4.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
>         * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
>         * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-5.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-6.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-7.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-8.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-9.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-10.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-11.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-12.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-13.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-14.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-15.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-16.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-17.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-18.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-19.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-20.c: Likewise.
>         * gcc.dg/vect/vect-over-widen-21.c: Likewise.
> ------------------------------------------------------------------------------
>
> Index: gcc/poly-int.h
> ===================================================================
> --- gcc/poly-int.h      2018-07-03 09:01:31.075962445 +0100
> +++ gcc/poly-int.h      2018-07-03 09:02:36.563413564 +0100
> @@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &val
>              poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
>  }
>
> +/* Use print_hex to print VALUE to FILE.  */
> +
> +template<unsigned int N, typename C>
> +void
> +print_hex (const poly_int_pod<N, C> &value, FILE *file)
> +{
> +  if (value.is_constant ())
> +    print_hex (value.coeffs[0], file);
> +  else
> +    {
> +      fprintf (file, "[");
> +      for (unsigned int i = 0; i < N; ++i)
> +       {
> +         print_hex (value.coeffs[i], file);
> +         fputc (i == N - 1 ? ']' : ',', file);
> +       }
> +    }
> +}
> +
>  /* Helper for calculating the distance between two points P1 and P2,
>     in cases where known_le (P1, P2).  T1 and T2 are the types of the
>     two positions, in either order.  The coefficients of P2 - P1 have
> Index: gcc/dumpfile.h
> ===================================================================
> --- gcc/dumpfile.h      2018-07-02 14:30:09.280175397 +0100
> +++ gcc/dumpfile.h      2018-07-03 09:02:36.563413564 +0100
> @@ -436,6 +436,8 @@ extern bool enable_rtl_dump_file (void);
>
>  template<unsigned int N, typename C>
>  void dump_dec (dump_flags_t, const poly_int<N, C> &);
> +extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
> +extern void dump_hex (dump_flags_t, const poly_wide_int &);
>
>  /* In tree-dump.c  */
>  extern void dump_node (const_tree, dump_flags_t, FILE *);
> Index: gcc/dumpfile.c
> ===================================================================
> --- gcc/dumpfile.c      2018-07-03 09:01:31.071962478 +0100
> +++ gcc/dumpfile.c      2018-07-03 09:02:36.563413564 +0100
> @@ -597,6 +597,28 @@ template void dump_dec (dump_flags_t, co
>  template void dump_dec (dump_flags_t, const poly_offset_int &);
>  template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> +void
> +dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
> +{
> +  if (dump_file && (dump_kind & pflags))
> +    print_dec (value, dump_file, sgn);
> +
> +  if (alt_dump_file && (dump_kind & alt_flags))
> +    print_dec (value, alt_dump_file, sgn);
> +}
> +
> +/* Output VALUE in hexadecimal to appropriate dump streams.  */
> +
> +void
> +dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
> +{
> +  if (dump_file && (dump_kind & pflags))
> +    print_hex (value, dump_file);
> +
> +  if (alt_dump_file && (dump_kind & alt_flags))
> +    print_hex (value, alt_dump_file);
> +}
> +
>  /* The current dump scope-nesting depth.  */
>
>  static int dump_scope_depth;
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2018-07-03 09:01:31.079962411 +0100
> +++ gcc/tree-vectorizer.h       2018-07-03 09:02:36.567413531 +0100
> @@ -899,6 +899,21 @@ typedef struct _stmt_vec_info {
>
>    /* The number of scalar stmt references from active SLP instances.  */
>    unsigned int num_slp_uses;
> +
> +  /* If nonzero, the lhs of the statement could be truncated to this
> +     many bits without affecting any users of the result.  */
> +  unsigned int min_output_precision;
> +
> +  /* If nonzero, all non-boolean input operands have the same precision,
> +     and they could each be truncated to this many bits without changing
> +     the result.  */
> +  unsigned int min_input_precision;
> +
> +  /* If OPERATION_BITS is nonzero, the statement could be performed on
> +     an integer with the sign and number of bits given by OPERATION_SIGN
> +     and OPERATION_BITS without changing the result.  */
> +  unsigned int operation_precision;
> +  signop operation_sign;
>  } *stmt_vec_info;
>
>  /* Information about a gather/scatter call.  */
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2018-07-03 09:01:31.035962780 +0100
> +++ gcc/tree-vect-patterns.c    2018-07-03 09:02:36.567413531 +0100
> @@ -47,6 +47,40 @@ Software Foundation; either version 3, o
>  #include "omp-simd-clone.h"
>  #include "predict.h"
>
> +/* Return true if we have a useful VR_RANGE range for VAR, storing it
> +   in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
> +
> +static bool
> +vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> +{
> +  value_range_type vr_type = get_range_info (var, min_value, max_value);
> +  wide_int nonzero = get_nonzero_bits (var);
> +  signop sgn = TYPE_SIGN (TREE_TYPE (var));
> +  if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
> +                                        nonzero, sgn) == VR_RANGE)
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> +         dump_printf (MSG_NOTE, " has range [");
> +         dump_hex (MSG_NOTE, *min_value);
> +         dump_printf (MSG_NOTE, ", ");
> +         dump_hex (MSG_NOTE, *max_value);
> +         dump_printf (MSG_NOTE, "]\n");
> +       }
> +      return true;
> +    }
> +  else
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> +         dump_printf (MSG_NOTE, " has no range info\n");
> +       }
> +      return false;
> +    }
> +}
> +
>  /* Report that we've found an instance of pattern PATTERN in
>     statement STMT.  */
>
> @@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree ot
>    return true;
>  }
>
> -/* Check whether STMT2 is in the same loop or basic block as STMT1.
> -   Which of the two applies depends on whether we're currently doing
> -   loop-based or basic-block-based vectorization, as determined by
> -   the vinfo_for_stmt for STMT1 (which must be defined).
> -
> -   If this returns true, vinfo_for_stmt for STMT2 is guaranteed
> -   to be defined as well.  */
> -
> -static bool
> -vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> -{
> -  stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> -  return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> -}
> -
> -/* If the LHS of DEF_STMT has a single use, and that statement is
> -   in the same loop or basic block, return it.  */
> -
> -static gimple *
> -vect_single_imm_use (gimple *def_stmt)
> -{
> -  tree lhs = gimple_assign_lhs (def_stmt);
> -  use_operand_p use_p;
> -  gimple *use_stmt;
> -
> -  if (!single_imm_use (lhs, &use_p, &use_stmt))
> -    return NULL;
> -
> -  if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
> -    return NULL;
> -
> -  return use_stmt;
> -}
> -
>  /* Round bit precision PRECISION up to a full element.  */
>
>  static unsigned int
> @@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_i
>     is possible to convert OP' back to OP using a possible sign change
>     followed by a possible promotion P.  Return this OP', or null if OP is
>     not a vectorizable SSA name.  If there is a promotion P, describe its
> -   input in UNPROM, otherwise describe OP' in UNPROM.
> +   input in UNPROM, otherwise describe OP' in UNPROM.  If SINGLE_USE_P
> +   is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
> +   have more than one user.
>
>     A successful return means that it is possible to go from OP' to OP
>     via UNPROM.  The cast from OP' to UNPROM is at most a sign change,
> @@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_i
>
>  static tree
>  vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> -                                     vect_unpromoted_value *unprom)
> +                                     vect_unpromoted_value *unprom,
> +                                     bool *single_use_p = NULL)
>  {
>    tree res = NULL_TREE;
>    tree op_type = TREE_TYPE (op);
> @@ -420,7 +423,14 @@ vect_look_through_possible_promotion (ve
>        if (!def_stmt)
>         break;
>        if (dt == vect_internal_def)
> -       caster = vinfo_for_stmt (def_stmt);
> +       {
> +         caster = vinfo_for_stmt (def_stmt);
> +         /* Ignore pattern statements, since we don't link uses for them.  */
> +         if (single_use_p
> +             && !STMT_VINFO_RELATED_STMT (caster)
> +             && !has_single_use (res))
> +           *single_use_p = false;
> +       }
>        else
>         caster = NULL;
>        gassign *assign = dyn_cast <gassign *> (def_stmt);
> @@ -1371,363 +1381,318 @@ vect_recog_widen_sum_pattern (vec<gimple
>    return pattern_stmt;
>  }
>
> +/* Recognize cases in which an operation is performed in one type WTYPE
> +   but could be done more efficiently in a narrower type NTYPE.  For example,
> +   if we have:
> +
> +     ATYPE a;  // narrower than NTYPE
> +     BTYPE b;  // narrower than NTYPE
> +     WTYPE aw = (WTYPE) a;
> +     WTYPE bw = (WTYPE) b;
> +     WTYPE res = aw + bw;  // only uses of aw and bw
> +
> +   then it would be more efficient to do:
> +
> +     NTYPE an = (NTYPE) a;
> +     NTYPE bn = (NTYPE) b;
> +     NTYPE resn = an + bn;
> +     WTYPE res = (WTYPE) resn;
> +
> +   Other situations include things like:
> +
> +     ATYPE a;  // NTYPE or narrower
> +     WTYPE aw = (WTYPE) a;
> +     WTYPE res = aw + b;
> +
> +   when only "(NTYPE) res" is significant.  In that case it's more efficient
> +   to truncate "b" and do the operation on NTYPE instead:
> +
> +     NTYPE an = (NTYPE) a;
> +     NTYPE bn = (NTYPE) b;  // truncation
> +     NTYPE resn = an + bn;
> +     WTYPE res = (WTYPE) resn;
> +
> +   All users of "res" should then use "resn" instead, making the final
> +   statement dead (not marked as relevant).  The final statement is still
> +   needed to maintain the type correctness of the IR.
> +
> +   vect_determine_precisions has already determined the minimum
> +   precison of the operation and the minimum precision required
> +   by users of the result.  */
>
> -/* Return TRUE if the operation in STMT can be performed on a smaller type.
> +static gimple *
> +vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> +{
> +  gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> +  if (!last_stmt)
> +    return NULL;
>
> -   Input:
> -   STMT - a statement to check.
> -   DEF - we support operations with two operands, one of which is constant.
> -         The other operand can be defined by a demotion operation, or by a
> -         previous statement in a sequence of over-promoted operations.  In the
> -         later case DEF is used to replace that operand.  (It is defined by a
> -         pattern statement we created for the previous statement in the
> -         sequence).
> -
> -   Input/output:
> -   NEW_TYPE - Output: a smaller type that we are trying to use.  Input: if not
> -         NULL, it's the type of DEF.
> -   STMTS - additional pattern statements.  If a pattern statement (type
> -         conversion) is created in this function, its original statement is
> -         added to STMTS.
> +  /* See whether we have found that this operation can be done on a
> +     narrower type without changing its semantics.  */
> +  stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> +  unsigned int new_precision = last_stmt_info->operation_precision;
> +  if (!new_precision)
> +    return NULL;
>
> -   Output:
> -   OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
> -         operands to use in the new pattern statement for STMT (will be created
> -         in vect_recog_over_widening_pattern ()).
> -   NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
> -         statements for STMT: the first one is a type promotion and the second
> -         one is the operation itself.  We return the type promotion statement
> -        in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
> -         the second pattern statement.  */
> +  vec_info *vinfo = last_stmt_info->vinfo;
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +  tree type = TREE_TYPE (lhs);
> +  tree_code code = gimple_assign_rhs_code (last_stmt);
> +
> +  /* Keep the first operand of a COND_EXPR as-is: only the other two
> +     operands are interesting.  */
> +  unsigned int first_op = (code == COND_EXPR ? 2 : 1);
>
> -static bool
> -vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
> -                                 tree *op0, tree *op1, gimple **new_def_stmt,
> -                                 vec<gimple *> *stmts)
> -{
> -  enum tree_code code;
> -  tree const_oprnd, oprnd;
> -  tree interm_type = NULL_TREE, half_type, new_oprnd, type;
> -  gimple *def_stmt, *new_stmt;
> -  bool first = false;
> -  bool promotion;
> +  /* Check the operands.  */
> +  unsigned int nops = gimple_num_ops (last_stmt) - first_op;
> +  auto_vec <vect_unpromoted_value, 3> unprom (nops);
> +  unprom.quick_grow (nops);
> +  unsigned int min_precision = 0;
> +  bool single_use_p = false;
> +  for (unsigned int i = 0; i < nops; ++i)
> +    {
> +      tree op = gimple_op (last_stmt, first_op + i);
> +      if (TREE_CODE (op) == INTEGER_CST)
> +       unprom[i].set_op (op, vect_constant_def);
> +      else if (TREE_CODE (op) == SSA_NAME)
> +       {
> +         bool op_single_use_p = true;
> +         if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
> +                                                    &op_single_use_p))
> +           return NULL;
> +         /* If:
>
> -  *op0 = NULL_TREE;
> -  *op1 = NULL_TREE;
> -  *new_def_stmt = NULL;
> +            (1) N bits of the result are needed;
> +            (2) all inputs are widened from M<N bits; and
> +            (3) one operand OP is a single-use SSA name
> +
> +            we can shift the M->N widening from OP to the output
> +            without changing the number or type of extensions involved.
> +            This then reduces the number of copies of STMT_INFO.
> +
> +            If instead of (3) more than one operand is a single-use SSA name,
> +            shifting the extension to the output is even more of a win.
> +
> +            If instead:
> +
> +            (1) N bits of the result are needed;
> +            (2) one operand OP2 is widened from M2<N bits;
> +            (3) another operand OP1 is widened from M1<M2 bits; and
> +            (4) both OP1 and OP2 are single-use
> +
> +            the choice is between:
> +
> +            (a) truncating OP2 to M1, doing the operation on M1,
> +                and then widening the result to N
> +
> +            (b) widening OP1 to M2, doing the operation on M2, and then
> +                widening the result to N
> +
> +            Both shift the M2->N widening of the inputs to the output.
> +            (a) additionally shifts the M1->M2 widening to the output;
> +            it requires fewer copies of STMT_INFO but requires an extra
> +            M2->M1 truncation.
> +
> +            Which is better will depend on the complexity and cost of
> +            STMT_INFO, which is hard to predict at this stage.  However,
> +            a clear tie-breaker in favor of (b) is the fact that the
> +            truncation in (a) increases the length of the operation chain.
> +
> +            If instead of (4) only one of OP1 or OP2 is single-use,
> +            (b) is still a win over doing the operation in N bits:
> +            it still shifts the M2->N widening on the single-use operand
> +            to the output and reduces the number of STMT_INFO copies.
> +
> +            If neither operand is single-use then operating on fewer than
> +            N bits might lead to more extensions overall.  Whether it does
> +            or not depends on global information about the vectorization
> +            region, and whether that's a good trade-off would again
> +            depend on the complexity and cost of the statements involved,
> +            as well as things like register pressure that are not normally
> +            modelled at this stage.  We therefore ignore these cases
> +            and just optimize the clear single-use wins above.
> +
> +            Thus we take the maximum precision of the unpromoted operands
> +            and record whether any operand is single-use.  */
> +         if (unprom[i].dt == vect_internal_def)
> +           {
> +             min_precision = MAX (min_precision,
> +                                  TYPE_PRECISION (unprom[i].type));
> +             single_use_p |= op_single_use_p;
> +           }
> +       }
> +    }
>
> -  if (!is_gimple_assign (stmt))
> -    return false;
> +  /* Although the operation could be done in operation_precision, we have
> +     to balance that against introducing extra truncations or extensions.
> +     Calculate the minimum precision that can be handled efficiently.
> +
> +     The loop above determined that the operation could be handled
> +     efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
> +     extension from the inputs to the output without introducing more
> +     instructions, and would reduce the number of instructions required
> +     for STMT_INFO itself.
> +
> +     vect_determine_precisions has also determined that the result only
> +     needs min_output_precision bits.  Truncating by a factor of N times
> +     requires a tree of N - 1 instructions, so if TYPE is N times wider
> +     than min_output_precision, doing the operation in TYPE and truncating
> +     the result requires N + (N - 1) = 2N - 1 instructions per output vector.
> +     In contrast:
> +
> +     - truncating the input to a unary operation and doing the operation
> +       in the new type requires at most N - 1 + 1 = N instructions per
> +       output vector
> +
> +     - doing the same for a binary operation requires at most
> +       (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
> +
> +     Both unary and binary operations require fewer instructions than
> +     this if the operands were extended from a suitable truncated form.
> +     Thus there is usually nothing to lose by doing operations in
> +     min_output_precision bits, but there can be something to gain.  */
> +  if (!single_use_p)
> +    min_precision = last_stmt_info->min_output_precision;
> +  else
> +    min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
>
> -  code = gimple_assign_rhs_code (stmt);
> -  if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
> -      && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
> -    return false;
> +  /* Apply the minimum efficient precision we just calculated.  */
> +  if (new_precision < min_precision)
> +    new_precision = min_precision;
> +  if (new_precision >= TYPE_PRECISION (type))
> +    return NULL;
>
> -  oprnd = gimple_assign_rhs1 (stmt);
> -  const_oprnd = gimple_assign_rhs2 (stmt);
> -  type = gimple_expr_type (stmt);
> +  vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
>
> -  if (TREE_CODE (oprnd) != SSA_NAME
> -      || TREE_CODE (const_oprnd) != INTEGER_CST)
> -    return false;
> +  *type_out = get_vectype_for_scalar_type (type);
> +  if (!*type_out)
> +    return NULL;
>
> -  /* If oprnd has other uses besides that in stmt we cannot mark it
> -     as being part of a pattern only.  */
> -  if (!has_single_use (oprnd))
> -    return false;
> +  /* We've found a viable pattern.  Get the new type of the operation.  */
> +  bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
> +  tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
> +
> +  /* We specifically don't check here whether the target supports the
> +     new operation, since it might be something that a later pattern
> +     wants to rewrite anyway.  If targets have a minimum element size
> +     for some optabs, we should pattern-match smaller ops to larger ops
> +     where beneficial.  */
> +  tree new_vectype = get_vectype_for_scalar_type (new_type);
> +  if (!new_vectype)
> +    return NULL;
>
> -  /* If we are in the middle of a sequence, we use DEF from a previous
> -     statement.  Otherwise, OPRND has to be a result of type promotion.  */
> -  if (*new_type)
> -    {
> -      half_type = *new_type;
> -      oprnd = def;
> -    }
> -  else
> +  if (dump_enabled_p ())
>      {
> -      first = true;
> -      if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
> -                             &promotion)
> -         || !promotion
> -         || !vect_same_loop_or_bb_p (stmt, def_stmt))
> -        return false;
> +      dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
> +      dump_printf (MSG_NOTE, " to ");
> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
> +      dump_printf (MSG_NOTE, "\n");
>      }
>
> -  /* Can we perform the operation on a smaller type?  */
> -  switch (code)
> -    {
> -      case BIT_IOR_EXPR:
> -      case BIT_XOR_EXPR:
> -      case BIT_AND_EXPR:
> -        if (!int_fits_type_p (const_oprnd, half_type))
> -          {
> -            /* HALF_TYPE is not enough.  Try a bigger type if possible.  */
> -            if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> -              return false;
> -
> -            interm_type = build_nonstandard_integer_type (
> -                        TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> -            if (!int_fits_type_p (const_oprnd, interm_type))
> -              return false;
> -          }
> -
> -        break;
> -
> -      case LSHIFT_EXPR:
> -        /* Try intermediate type - HALF_TYPE is not enough for sure.  */
> -        if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> -          return false;
> -
> -        /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
> -          (e.g., if the original value was char, the shift amount is at most 8
> -           if we want to use short).  */
> -        if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
> -          return false;
> -
> -        interm_type = build_nonstandard_integer_type (
> -                        TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> -
> -        if (!vect_supportable_shift (code, interm_type))
> -          return false;
> -
> -        break;
> -
> -      case RSHIFT_EXPR:
> -        if (vect_supportable_shift (code, half_type))
> -          break;
> -
> -        /* Try intermediate type - HALF_TYPE is not supported.  */
> -        if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> -          return false;
> -
> -        interm_type = build_nonstandard_integer_type (
> -                        TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> -
> -        if (!vect_supportable_shift (code, interm_type))
> -          return false;
> -
> -        break;
> -
> -      default:
> -        gcc_unreachable ();
> -    }
> -
> -  /* There are four possible cases:
> -     1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
> -        the first statement in the sequence)
> -        a. The original, HALF_TYPE, is not enough - we replace the promotion
> -           from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
> -        b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
> -           promotion.
> -     2. OPRND is defined by a pattern statement we created.
> -        a. Its type is not sufficient for the operation, we create a new stmt:
> -           a type conversion for OPRND from HALF_TYPE to INTERM_TYPE.  We store
> -           this statement in NEW_DEF_STMT, and it is later put in
> -          STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
> -        b. OPRND is good to use in the new statement.  */
> -  if (first)
> -    {
> -      if (interm_type)
> -        {
> -          /* Replace the original type conversion HALF_TYPE->TYPE with
> -             HALF_TYPE->INTERM_TYPE.  */
> -          if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
> -            {
> -              new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
> -              /* Check if the already created pattern stmt is what we need.  */
> -              if (!is_gimple_assign (new_stmt)
> -                  || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
> -                  || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
> -                return false;
> -
> -             stmts->safe_push (def_stmt);
> -              oprnd = gimple_assign_lhs (new_stmt);
> -            }
> -          else
> -            {
> -              /* Create NEW_OPRND = (INTERM_TYPE) OPRND.  */
> -              oprnd = gimple_assign_rhs1 (def_stmt);
> -             new_oprnd = make_ssa_name (interm_type);
> -             new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> -              STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
> -              stmts->safe_push (def_stmt);
> -              oprnd = new_oprnd;
> -            }
> -        }
> -      else
> -        {
> -          /* Retrieve the operand before the type promotion.  */
> -          oprnd = gimple_assign_rhs1 (def_stmt);
> -        }
> -    }
> -  else
> -    {
> -      if (interm_type)
> -        {
> -          /* Create a type conversion HALF_TYPE->INTERM_TYPE.  */
> -         new_oprnd = make_ssa_name (interm_type);
> -         new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> -          oprnd = new_oprnd;
> -          *new_def_stmt = new_stmt;
> -        }
> +  /* Calculate the rhs operands for an operation on NEW_TYPE.  */
> +  STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
> +  tree ops[3] = {};
> +  for (unsigned int i = 1; i < first_op; ++i)
> +    ops[i - 1] = gimple_op (last_stmt, i);
> +  vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
> +                      new_type, &unprom[0], new_vectype);
> +
> +  /* Use the operation to produce a result of type NEW_TYPE.  */
> +  tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> +  gimple *pattern_stmt = gimple_build_assign (new_var, code,
> +                                             ops[0], ops[1], ops[2]);
> +  gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> -      /* Otherwise, OPRND is already set.  */
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +                      "created pattern stmt: ");
> +      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
>      }
>
> -  if (interm_type)
> -    *new_type = interm_type;
> -  else
> -    *new_type = half_type;
> +  pattern_stmt = vect_convert_output (last_stmt_info, type,
> +                                     pattern_stmt, new_vectype);
>
> -  *op0 = oprnd;
> -  *op1 = fold_convert (*new_type, const_oprnd);
> -
> -  return true;
> +  stmts->safe_push (last_stmt);
> +  return pattern_stmt;
>  }
>
> +/* Recognize cases in which the input to a cast is wider than its
> +   output, and the input is fed by a widening operation.  Fold this
> +   by removing the unnecessary intermediate widening.  E.g.:
>
> -/* Try to find a statement or a sequence of statements that can be performed
> -   on a smaller type:
> +     unsigned char a;
> +     unsigned int b = (unsigned int) a;
> +     unsigned short c = (unsigned short) b;
>
> -     type x_t;
> -     TYPE x_T, res0_T, res1_T;
> -   loop:
> -     S1  x_t = *p;
> -     S2  x_T = (TYPE) x_t;
> -     S3  res0_T = op (x_T, C0);
> -     S4  res1_T = op (res0_T, C1);
> -     S5  ... = () res1_T;  - type demotion
> -
> -   where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
> -   constants.
> -   Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
> -   be 'type' or some intermediate type.  For now, we expect S5 to be a type
> -   demotion operation.  We also check that S3 and S4 have only one use.  */
> +   -->
>
> -static gimple *
> -vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> -{
> -  gimple *stmt = stmts->pop ();
> -  gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
> -        *use_stmt = NULL;
> -  tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
> -  tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
> -  bool first;
> -  tree type = NULL;
> -
> -  first = true;
> -  while (1)
> -    {
> -      if (!vinfo_for_stmt (stmt)
> -          || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
> -        return NULL;
> -
> -      new_def_stmt = NULL;
> -      if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
> -                                             &op0, &op1, &new_def_stmt,
> -                                             stmts))
> -        {
> -          if (first)
> -            return NULL;
> -          else
> -            break;
> -        }
> +     unsigned short c = (unsigned short) a;
>
> -      /* STMT can be performed on a smaller type.  Check its uses.  */
> -      use_stmt = vect_single_imm_use (stmt);
> -      if (!use_stmt || !is_gimple_assign (use_stmt))
> -        return NULL;
> -
> -      /* Create pattern statement for STMT.  */
> -      vectype = get_vectype_for_scalar_type (new_type);
> -      if (!vectype)
> -        return NULL;
> -
> -      /* We want to collect all the statements for which we create pattern
> -         statetments, except for the case when the last statement in the
> -         sequence doesn't have a corresponding pattern statement.  In such
> -         case we associate the last pattern statement with the last statement
> -         in the sequence.  Therefore, we only add the original statement to
> -         the list if we know that it is not the last.  */
> -      if (prev_stmt)
> -        stmts->safe_push (prev_stmt);
> +   Although this is rare in input IR, it is an expected side-effect
> +   of the over-widening pattern above.
>
> -      var = vect_recog_temp_ssa_var (new_type, NULL);
> -      pattern_stmt
> -       = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
> -      STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
> -      new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
> +   This is beneficial also for integer-to-float conversions, if the
> +   widened integer has more bits than the float, and if the unwidened
> +   input doesn't.  */
>
> -      if (dump_enabled_p ())
> -        {
> -          dump_printf_loc (MSG_NOTE, vect_location,
> -                           "created pattern stmt: ");
> -          dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> -        }
> +static gimple *
> +vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
> +{
> +  /* Check for a cast, including an integer-to-float conversion.  */
> +  gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> +  if (!last_stmt)
> +    return NULL;
> +  tree_code code = gimple_assign_rhs_code (last_stmt);
> +  if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
> +    return NULL;
>
> -      type = gimple_expr_type (stmt);
> -      prev_stmt = stmt;
> -      stmt = use_stmt;
> -
> -      first = false;
> -    }
> -
> -  /* We got a sequence.  We expect it to end with a type demotion operation.
> -     Otherwise, we quit (for now).  There are three possible cases: the
> -     conversion is to NEW_TYPE (we don't do anything), the conversion is to
> -     a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
> -     NEW_TYPE differs (we create a new conversion statement).  */
> -  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
> -    {
> -      use_lhs = gimple_assign_lhs (use_stmt);
> -      use_type = TREE_TYPE (use_lhs);
> -      /* Support only type demotion or signedess change.  */
> -      if (!INTEGRAL_TYPE_P (use_type)
> -         || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
> -        return NULL;
> +  /* Make sure that the rhs is a scalar with a natural bitsize.  */
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +  if (!lhs)
> +    return NULL;
> +  tree lhs_type = TREE_TYPE (lhs);
> +  scalar_mode lhs_mode;
> +  if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
> +      || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
> +    return NULL;
>
> -      /* Check that NEW_TYPE is not bigger than the conversion result.  */
> -      if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
> -       return NULL;
> +  /* Check for a narrowing operation (from a vector point of view).  */
> +  tree rhs = gimple_assign_rhs1 (last_stmt);
> +  tree rhs_type = TREE_TYPE (rhs);
> +  if (!INTEGRAL_TYPE_P (rhs_type)
> +      || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
> +      || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
> +    return NULL;
>
> -      if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
> -          || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
> -        {
> -         *type_out = get_vectype_for_scalar_type (use_type);
> -         if (!*type_out)
> -           return NULL;
> +  /* Try to find an unpromoted input.  */
> +  stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> +  vec_info *vinfo = last_stmt_info->vinfo;
> +  vect_unpromoted_value unprom;
> +  if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
> +      || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
> +    return NULL;
>
> -          /* Create NEW_TYPE->USE_TYPE conversion.  */
> -         new_oprnd = make_ssa_name (use_type);
> -         pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
> -          STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
> -
> -          /* We created a pattern statement for the last statement in the
> -             sequence, so we don't need to associate it with the pattern
> -             statement created for PREV_STMT.  Therefore, we add PREV_STMT
> -             to the list in order to mark it later in vect_pattern_recog_1.  */
> -          if (prev_stmt)
> -            stmts->safe_push (prev_stmt);
> -        }
> -      else
> -        {
> -          if (prev_stmt)
> -           STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
> -              = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
> +  /* If the bits above RHS_TYPE matter, make sure that they're the
> +     same when extending from UNPROM as they are when extending from RHS.  */
> +  if (!INTEGRAL_TYPE_P (lhs_type)
> +      && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
> +    return NULL;
>
> -         *type_out = vectype;
> -        }
> +  /* We can get the same result by casting UNPROM directly, to avoid
> +     the unnecessary widening and narrowing.  */
> +  vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
>
> -      stmts->safe_push (use_stmt);
> -    }
> -  else
> -    /* TODO: support general case, create a conversion to the correct type.  */
> +  *type_out = get_vectype_for_scalar_type (lhs_type);
> +  if (!*type_out)
>      return NULL;
>
> -  /* Pattern detected.  */
> -  vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
> +  tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> +  gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op);
> +  gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> +  stmts->safe_push (last_stmt);
>    return pattern_stmt;
>  }
>
> @@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<g
>    return pattern_stmt;
>  }
>
> +/* Return true if TYPE is a non-boolean integer type.  These are the types
> +   that we want to consider for narrowing.  */
> +
> +static bool
> +vect_narrowable_type_p (tree type)
> +{
> +  return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
> +}
> +
> +/* Return true if the operation given by CODE can be truncated to N bits
> +   when only N bits of the output are needed.  This is only true if bit N+1
> +   of the inputs has no effect on the low N bits of the result.  */
> +
> +static bool
> +vect_truncatable_operation_p (tree_code code)
> +{
> +  switch (code)
> +    {
> +    case PLUS_EXPR:
> +    case MINUS_EXPR:
> +    case MULT_EXPR:
> +    case BIT_AND_EXPR:
> +    case BIT_IOR_EXPR:
> +    case BIT_XOR_EXPR:
> +    case COND_EXPR:
> +      return true;
> +
> +    default:
> +      return false;
> +    }
> +}
> +
> +/* Record that STMT_INFO could be changed from operating on TYPE to
> +   operating on a type with the precision and sign given by PRECISION
> +   and SIGN respectively.  PRECISION is an arbitrary bit precision;
> +   it might not be a whole number of bytes.  */
> +
> +static void
> +vect_set_operation_type (stmt_vec_info stmt_info, tree type,
> +                        unsigned int precision, signop sign)
> +{
> +  /* Round the precision up to a whole number of bytes.  */
> +  precision = vect_element_precision (precision);
> +  if (precision < TYPE_PRECISION (type)
> +      && (!stmt_info->operation_precision
> +         || stmt_info->operation_precision > precision))
> +    {
> +      stmt_info->operation_precision = precision;
> +      stmt_info->operation_sign = sign;
> +    }
> +}
> +
> +/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
> +   non-boolean inputs, all of which have type TYPE.  MIN_INPUT_PRECISION
> +   is an arbitrary bit precision; it might not be a whole number of bytes.  */
> +
> +static void
> +vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
> +                             unsigned int min_input_precision)
> +{
> +  /* This operation in isolation only requires the inputs to have
> +     MIN_INPUT_PRECISION of precision,  However, that doesn't mean
> +     that MIN_INPUT_PRECISION is a natural precision for the chain
> +     as a whole.  E.g. consider something like:
> +
> +        unsigned short *x, *y;
> +        *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> +     The right shift can be done on unsigned chars, and only requires the
> +     result of "*x & 0xf0" to be done on unsigned chars.  But taking that
> +     approach would mean turning a natural chain of single-vector unsigned
> +     short operations into one that truncates "*x" and then extends
> +     "(*x & 0xf0) >> 4", with two vectors for each unsigned short
> +     operation and one vector for each unsigned char operation.
> +     This would be a significant pessimization.
> +
> +     Instead only propagate the maximum of this precision and the precision
> +     required by the users of the result.  This means that we don't pessimize
> +     the case above but continue to optimize things like:
> +
> +        unsigned char *y;
> +        unsigned short *x;
> +        *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> +     Here we would truncate two vectors of *x to a single vector of
> +     unsigned chars and use single-vector unsigned char operations for
> +     everything else, rather than doing two unsigned short copies of
> +     "(*x & 0xf0) >> 4" and then truncating the result.  */
> +  min_input_precision = MAX (min_input_precision,
> +                            stmt_info->min_output_precision);
> +
> +  if (min_input_precision < TYPE_PRECISION (type)
> +      && (!stmt_info->min_input_precision
> +         || stmt_info->min_input_precision > min_input_precision))
> +    stmt_info->min_input_precision = min_input_precision;
> +}
> +
> +/* Subroutine of vect_determine_min_output_precision.  Return true if
> +   we can calculate a reduced number of output bits for STMT_INFO,
> +   whose result is LHS.  */
> +
> +static bool
> +vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
> +{
> +  /* Take the maximum precision required by users of the result.  */
> +  unsigned int precision = 0;
> +  imm_use_iterator iter;
> +  use_operand_p use;
> +  FOR_EACH_IMM_USE_FAST (use, iter, lhs)
> +    {
> +      gimple *use_stmt = USE_STMT (use);
> +      if (is_gimple_debug (use_stmt))
> +       continue;
> +      if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
> +       return false;
> +      stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
> +      if (!use_stmt_info->min_input_precision)
> +       return false;
> +      precision = MAX (precision, use_stmt_info->min_input_precision);
> +    }
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
> +                      precision);
> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
> +      dump_printf (MSG_NOTE, " are significant\n");
> +    }
> +  stmt_info->min_output_precision = precision;
> +  return true;
> +}
> +
> +/* Calculate min_output_precision for STMT_INFO.  */
> +
> +static void
> +vect_determine_min_output_precision (stmt_vec_info stmt_info)
> +{
> +  /* We're only interested in statements with a narrowable result.  */
> +  tree lhs = gimple_get_lhs (stmt_info->stmt);
> +  if (!lhs
> +      || TREE_CODE (lhs) != SSA_NAME
> +      || !vect_narrowable_type_p (TREE_TYPE (lhs)))
> +    return;
> +
> +  if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
> +    stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
> +}
> +
> +/* Use range information to decide whether STMT (described by STMT_INFO)
> +   could be done in a narrower type.  This is effectively a forward
> +   propagation, since it uses context-independent information that applies
> +   to all users of an SSA name.  */
> +
> +static void
> +vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
> +{
> +  tree lhs = gimple_assign_lhs (stmt);
> +  if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> +    return;
> +
> +  tree type = TREE_TYPE (lhs);
> +  if (!vect_narrowable_type_p (type))
> +    return;
> +
> +  /* First see whether we have any useful range information for the result.  */
> +  unsigned int precision = TYPE_PRECISION (type);
> +  signop sign = TYPE_SIGN (type);
> +  wide_int min_value, max_value;
> +  if (!vect_get_range_info (lhs, &min_value, &max_value))
> +    return;
> +
> +  tree_code code = gimple_assign_rhs_code (stmt);
> +  unsigned int nops = gimple_num_ops (stmt);
> +
> +  if (!vect_truncatable_operation_p (code))
> +    /* Check that all relevant input operands are compatible, and update
> +       [MIN_VALUE, MAX_VALUE] to include their ranges.  */
> +    for (unsigned int i = 1; i < nops; ++i)
> +      {
> +       tree op = gimple_op (stmt, i);
> +       if (TREE_CODE (op) == INTEGER_CST)
> +         {
> +           /* Don't require the integer to have RHS_TYPE (which it might
> +              not for things like shift amounts, etc.), but do require it
> +              to fit the type.  */
> +           if (!int_fits_type_p (op, type))
> +             return;
> +
> +           min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
> +           max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
> +         }
> +       else if (TREE_CODE (op) == SSA_NAME)
> +         {
> +           /* Ignore codes that don't take uniform arguments.  */
> +           if (!types_compatible_p (TREE_TYPE (op), type))
> +             return;
> +
> +           wide_int op_min_value, op_max_value;
> +           if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> +             return;
> +
> +           min_value = wi::min (min_value, op_min_value, sign);
> +           max_value = wi::max (max_value, op_max_value, sign);
> +         }
> +       else
> +         return;
> +      }
> +
> +  /* Try to switch signed types for unsigned types if we can.
> +     This is better for two reasons.  First, unsigned ops tend
> +     to be cheaper than signed ops.  Second, it means that we can
> +     handle things like:
> +
> +       signed char c;
> +       int res = (int) c & 0xff00; // range [0x0000, 0xff00]
> +
> +     as:
> +
> +       signed char c;
> +       unsigned short res_1 = (unsigned short) c & 0xff00;
> +       int res = (int) res_1;
> +
> +     where the intermediate result res_1 has unsigned rather than
> +     signed type.  */
> +  if (sign == SIGNED && !wi::neg_p (min_value))
> +    sign = UNSIGNED;
> +
> +  /* See what precision is required for MIN_VALUE and MAX_VALUE.  */
> +  unsigned int precision1 = wi::min_precision (min_value, sign);
> +  unsigned int precision2 = wi::min_precision (max_value, sign);
> +  unsigned int value_precision = MAX (precision1, precision2);
> +  if (value_precision >= precision)
> +    return;
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> +                      " without loss of precision: ",
> +                      sign == SIGNED ? "signed" : "unsigned",
> +                      value_precision);
> +      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> +    }
> +
> +  vect_set_operation_type (stmt_info, type, value_precision, sign);
> +  vect_set_min_input_precision (stmt_info, type, value_precision);
> +}
> +
> +/* Use information about the users of STMT's result to decide whether
> +   STMT (described by STMT_INFO) could be done in a narrower type.
> +   This is effectively a backward propagation.  */
> +
> +static void
> +vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
> +{
> +  tree_code code = gimple_assign_rhs_code (stmt);
> +  unsigned int opno = (code == COND_EXPR ? 2 : 1);
> +  tree type = TREE_TYPE (gimple_op (stmt, opno));
> +  if (!vect_narrowable_type_p (type))
> +    return;
> +
> +  unsigned int precision = TYPE_PRECISION (type);
> +  unsigned int operation_precision, min_input_precision;
> +  switch (code)
> +    {
> +    CASE_CONVERT:
> +      /* Only the bits that contribute to the output matter.  Don't change
> +        the precision of the operation itself.  */
> +      operation_precision = precision;
> +      min_input_precision = stmt_info->min_output_precision;
> +      break;
> +
> +    case LSHIFT_EXPR:
> +    case RSHIFT_EXPR:
> +      {
> +       tree shift = gimple_assign_rhs2 (stmt);
> +       if (TREE_CODE (shift) != INTEGER_CST
> +           || !wi::ltu_p (wi::to_widest (shift), precision))
> +         return;
> +       unsigned int const_shift = TREE_INT_CST_LOW (shift);
> +       if (code == LSHIFT_EXPR)
> +         {
> +           /* We need CONST_SHIFT fewer bits of the input.  */
> +           operation_precision = stmt_info->min_output_precision;
> +           min_input_precision = (MAX (operation_precision, const_shift)
> +                                   - const_shift);
> +         }
> +       else
> +         {
> +           /* We need CONST_SHIFT extra bits to do the operation.  */
> +           operation_precision = (stmt_info->min_output_precision
> +                                  + const_shift);
> +           min_input_precision = operation_precision;
> +         }
> +       break;
> +      }
> +
> +    default:
> +      if (vect_truncatable_operation_p (code))
> +       {
> +         /* Input bit N has no effect on output bits N-1 and lower.  */
> +         operation_precision = stmt_info->min_output_precision;
> +         min_input_precision = operation_precision;
> +         break;
> +       }
> +      return;
> +    }
> +
> +  if (operation_precision < precision)
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> +                          " without affecting users: ",
> +                          TYPE_UNSIGNED (type) ? "unsigned" : "signed",
> +                          operation_precision);
> +         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> +       }
> +      vect_set_operation_type (stmt_info, type, operation_precision,
> +                              TYPE_SIGN (type));
> +    }
> +  vect_set_min_input_precision (stmt_info, type, min_input_precision);
> +}
> +
> +/* Handle vect_determine_precisions for STMT_INFO, given that we
> +   have already done so for the users of its result.  */
> +
> +void
> +vect_determine_stmt_precisions (stmt_vec_info stmt_info)
> +{
> +  vect_determine_min_output_precision (stmt_info);
> +  if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
> +    {
> +      vect_determine_precisions_from_range (stmt_info, stmt);
> +      vect_determine_precisions_from_users (stmt_info, stmt);
> +    }
> +}
> +
> +/* Walk backwards through the vectorizable region to determine the
> +   values of these fields:
> +
> +   - min_output_precision
> +   - min_input_precision
> +   - operation_precision
> +   - operation_sign.  */
> +
> +void
> +vect_determine_precisions (vec_info *vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_determine_precisions");
> +
> +  if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> +    {
> +      struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> +      unsigned int nbbs = loop->num_nodes;
> +
> +      for (unsigned int i = 0; i < nbbs; i++)
> +       {
> +         basic_block bb = bbs[nbbs - i - 1];
> +         for (gimple_stmt_iterator si = gsi_last_bb (bb);
> +              !gsi_end_p (si); gsi_prev (&si))
> +           vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
> +       }
> +    }
> +  else
> +    {
> +      bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> +      gimple_stmt_iterator si = bb_vinfo->region_end;
> +      gimple *stmt;
> +      do
> +       {
> +         if (!gsi_stmt (si))
> +           si = gsi_last_bb (bb_vinfo->bb);
> +         else
> +           gsi_prev (&si);
> +         stmt = gsi_stmt (si);
> +         stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +         if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
> +           vect_determine_stmt_precisions (stmt_info);
> +       }
> +      while (stmt != gsi_stmt (bb_vinfo->region_begin));
> +    }
> +}
> +
>  typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
>
>  struct vect_recog_func
> @@ -4217,13 +4566,14 @@ struct vect_recog_func
>     taken which means usually the more complex one needs to preceed the
>     less comples onex (widen_sum only after dot_prod or sad for example).  */
>  static vect_recog_func vect_vect_recog_func_ptrs[] = {
> +  { vect_recog_over_widening_pattern, "over_widening" },
> +  { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
>    { vect_recog_widen_mult_pattern, "widen_mult" },
>    { vect_recog_dot_prod_pattern, "dot_prod" },
>    { vect_recog_sad_pattern, "sad" },
>    { vect_recog_widen_sum_pattern, "widen_sum" },
>    { vect_recog_pow_pattern, "pow" },
>    { vect_recog_widen_shift_pattern, "widen_shift" },
> -  { vect_recog_over_widening_pattern, "over_widening" },
>    { vect_recog_rotate_pattern, "rotate" },
>    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>    { vect_recog_divmod_pattern, "divmod" },
> @@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo)
>    unsigned int i, j;
>    auto_vec<gimple *, 1> stmts_to_replace;
>
> +  vect_determine_precisions (vinfo);
> +
>    DUMP_VECT_SCOPE ("vect_pattern_recog");
>
>    if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> Index: gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c  2016-11-11 17:07:36.776796115 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c  2018-07-03 09:02:36.567413531 +0100
> @@ -43,5 +43,5 @@ int main (void)
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
> -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
> +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c       2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c       2018-07-03 09:02:36.563413564 +0100
> @@ -62,8 +62,9 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c     2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c     2018-07-03 09:02:36.563413564 +0100
> @@ -58,7 +58,9 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c       2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c       2018-07-03 09:02:36.563413564 +0100
> @@ -57,7 +57,12 @@ int main (void)
>    return 0;
>  }
>
> -/* Final value stays in int, so no over-widening is detected at the moment.  */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> +/* This is an over-widening even though the final result is still an int.
> +   It's better to do one vector of ops on chars and then widen than to
> +   widen and then do 4 vectors of ops on ints.  */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c     2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c     2018-07-03 09:02:36.563413564 +0100
> @@ -57,7 +57,12 @@ int main (void)
>    return 0;
>  }
>
> -/* Final value stays in int, so no over-widening is detected at the moment.  */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> +/* This is an over-widening even though the final result is still an int.
> +   It's better to do one vector of ops on chars and then widen than to
> +   widen and then do 4 vectors of ops on ints.  */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c       2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c       2018-07-03 09:02:36.563413564 +0100
> @@ -57,6 +57,9 @@ int main (void)
>    return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c     2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c     2018-07-03 09:02:36.563413564 +0100
> @@ -59,7 +59,9 @@ int main (void)
>    return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c       2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c       2018-07-03 09:02:36.563413564 +0100
> @@ -66,8 +66,9 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c     2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c     2018-07-03 09:02:36.563413564 +0100
> @@ -62,7 +62,9 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
> ===================================================================
> --- /dev/null   2018-06-13 14:36:57.192460992 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c     2018-07-03 09:02:36.563413564 +0100
> @@ -0,0 +1,66 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-require-effective-target vect_pack_trunc } */
> +/* { dg-require-effective-target vect_unpack } */
> +
> +#include "tree-vect.h"
> +
> +/* Deliberate use of signed >>.  */
> +#define DEF_LOOP(SIGNEDNESS)                   \
> +  void __attribute__ ((noipa))                 \
> +  f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
> +                 SIGNEDNESS char *restrict b,  \
> +                 SIGNEDNESS char *restrict c)  \
> +  {                                            \
> +    a[0] = (b[0] + c[0]) >> 1;                 \
> +    a[1] = (b[1] + c[1]) >> 1;                 \
> +    a[2] = (b[2] + c[2]) >> 1;                 \
> +    a[3] = (b[3] + c[3]) >> 1;                 \
> +    a[4] = (b[4] + c[4]) >> 1;                 \
> +    a[5] = (b[5] + c[5]) >> 1;                 \
> +    a[6] = (b[6] + c[6]) >> 1;                 \
> +    a[7] = (b[7] + c[7]) >> 1;                 \
> +    a[8] = (b[8] + c[8]) >> 1;                 \
> +    a[9] = (b[9] + c[9]) >> 1;                 \
> +    a[10] = (b[10] + c[10]) >> 1;              \
> +    a[11] = (b[11] + c[11]) >> 1;              \
> +    a[12] = (b[12] + c[12]) >> 1;              \
> +    a[13] = (b[13] + c[13]) >> 1;              \
> +    a[14] = (b[14] + c[14]) >> 1;              \
> +    a[15] = (b[15] + c[15]) >> 1;              \
> +  }
> +
> +DEF_LOOP (signed)
> +DEF_LOOP (unsigned)
> +
> +#define N 16
> +
> +#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C)          \
> +  {                                                    \
> +    SIGNEDNESS char a[N], b[N], c[N];                  \
> +    for (int i = 0; i < N; ++i)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-07-03 20:08       ` Christophe Lyon
@ 2018-07-03 20:39         ` Rainer Orth
  2018-07-04  7:18         ` Richard Sandiford
  1 sibling, 0 replies; 10+ messages in thread
From: Rainer Orth @ 2018-07-03 20:39 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Richard Biener, gcc Patches, Richard Sandiford

Hi Christophe,

> It seems the new bb-slp-over-widen tests lack a -fdump option:
> gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
> does not exist
> UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "basic block vectorized" 2

indeed, but that's not enough: adding

/* { dg-additional-options "-fdump-tree-vect-details" } */

to both affected tests (gcc.dg/vect/bb-slp-over-widen-[12].c) yields

FAIL: gcc.dg/vect/bb-slp-over-widen-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "basic block vectorized" 2
FAIL: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects  scan-tree-dump-times vect "basic block vectorized" 2

on both 32 and 64-bit x86, and the dump contains:

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:60:3: note:   not vectorized: control flow in loop.
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:60:3: note:  not vectorized: loop contains function calls or data references that cannot be analyzed
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:59:3: note:   not vectorized: control flow in loop.
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:59:3: note:  not vectorized: loop contains function calls or data references that cannot be analyzed
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:55:1: note: vectorized 0 loops in function.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [14/n] PR85694: Rework overwidening detection
  2018-07-03 20:08       ` Christophe Lyon
  2018-07-03 20:39         ` Rainer Orth
@ 2018-07-04  7:18         ` Richard Sandiford
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2018-07-04  7:18 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Richard Biener, gcc Patches

Christophe Lyon <christophe.lyon@linaro.org> writes:
> On Tue, 3 Jul 2018 at 12:02, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> > This patch is the main part of PR85694.  The aim is to recognise
> at least:
>> >> >
>> >> >   signed char *a, *b, *c;
>> >> >   ...
>> >> >   for (int i = 0; i < 2048; i++)
>> >> >     c[i] = (a[i] + b[i]) >> 1;
>> >> >
>> >> > as an over-widening pattern, since the addition and shift can be done
>> >> > on shorts rather than ints.  However, it ended up being a lot more
>> >> > general than that.
>> >> >
>> >> > The current over-widening pattern detection is limited to a few simple
>> >> > cases: logical ops with immediate second operands, and shifts by a
>> >> > constant.  These cases are enough for common pixel-format conversion
>> >> > and can be detected in a peephole way.
>> >> >
>> >> > The loop above requires two generalisations of the current code: support
>> >> > for addition as well as logical ops, and support for non-constant second
>> >> > operands.  These are harder to detect in the same peephole way, so the
>> >> > patch tries to take a more global approach.
>> >> >
>> >> > The idea is to get information about the minimum operation width
>> >> > in two ways:
>> >> >
>> >> > (1) by using the range information attached to the SSA_NAMEs
>> >> >     (effectively a forward walk, since the range info is
>> >> >     context-independent).
>> >> >
>> >> > (2) by back-propagating the number of output bits required by
>> >> >     users of the result.
>> >> >
>> >> > As explained in the comments, there's a balance to be struck between
>> >> > narrowing an individual operation and fitting in with the surrounding
>> >> > code.  The approach is pretty conservative: if we could narrow an
>> >> > operation to N bits without changing its semantics, it's OK to do
> that if:
>> >> >
>> >> > - no operations later in the chain require more than N bits; or
>> >> >
>> >> > - all internally-defined inputs are extended from N bits or fewer,
>> >> >   and at least one of them is single-use.
>> >> >
>> >> > See the comments for the rationale.
>> >> >
>> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> >> > since the code seemed more readable without.
>> >> >
>> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> >>
>> >> Here's a version rebased on top of current trunk.  Changes from last time:
>> >>
>> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
>> >>   prototype
>> >>
>> >> - fix a typo in a comment
>> >>
>> >> - use vect_element_precision from the new version of 12/n.
>> >>
>> >> Tested as before.  OK to install?
>> >
>> > OK.
>>
>> Thanks.  For the record, here's what I installed (updated on top of
>> Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).
>>
>> Richard
>>
> Hi,
>
> It seems the new bb-slp-over-widen tests lack a -fdump option:
> gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
> does not exist
> UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "basic block vectorized" 2

I've applied the following as obvious.

Richard


2018-07-04  Richard Sandiford  <richard.sandiford@arm.com>

gcc/testsuite/
	* gcc.dg/vect/bb-slp-over-widen-1.c: Fix name of dump file for
	final scan test.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c	2018-07-03 10:59:30.480481417 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c	2018-07-04 08:16:36.210113069 +0100
@@ -63,4 +63,4 @@ main (void)
 
 /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c	2018-07-03 10:59:30.480481417 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c	2018-07-04 08:16:36.210113069 +0100
@@ -62,4 +62,4 @@ main (void)
 
 /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-04  7:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-20 10:37 [14/n] PR85694: Rework overwidening detection Richard Sandiford
2018-06-29 12:56 ` Richard Sandiford
2018-07-02 11:02   ` Christophe Lyon
2018-07-02 13:37     ` Richard Sandiford
2018-07-02 13:52       ` Christophe Lyon
2018-07-02 13:12   ` Richard Biener
2018-07-03 10:02     ` Richard Sandiford
2018-07-03 20:08       ` Christophe Lyon
2018-07-03 20:39         ` Rainer Orth
2018-07-04  7:18         ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).