* [14/n] PR85694: Rework overwidening detection
@ 2018-06-20 10:37 Richard Sandiford
2018-06-29 12:56 ` Richard Sandiford
0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2018-06-20 10:37 UTC (permalink / raw)
To: gcc-patches
This patch is the main part of PR85694. The aim is to recognise at least:
signed char *a, *b, *c;
...
for (int i = 0; i < 2048; i++)
c[i] = (a[i] + b[i]) >> 1;
as an over-widening pattern, since the addition and shift can be done
on shorts rather than ints. However, it ended up being a lot more
general than that.
The current over-widening pattern detection is limited to a few simple
cases: logical ops with immediate second operands, and shifts by a
constant. These cases are enough for common pixel-format conversion
and can be detected in a peephole way.
The loop above requires two generalisations of the current code: support
for addition as well as logical ops, and support for non-constant second
operands. These are harder to detect in the same peephole way, so the
patch tries to take a more global approach.
The idea is to get information about the minimum operation width
in two ways:
(1) by using the range information attached to the SSA_NAMEs
(effectively a forward walk, since the range info is
context-independent).
(2) by back-propagating the number of output bits required by
users of the result.
As explained in the comments, there's a balance to be struck between
narrowing an individual operation and fitting in with the surrounding
code. The approach is pretty conservative: if we could narrow an
operation to N bits without changing its semantics, it's OK to do that if:
- no operations later in the chain require more than N bits; or
- all internally-defined inputs are extended from N bits or fewer,
and at least one of them is single-use.
See the comments for the rationale.
I didn't bother adding STMT_VINFO_* wrappers for the new fields
since the code seemed more readable without.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Richard
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_dec, dump_hex): Declare.
* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration. Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.
gcc/testsuite/
* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
over-widening messages.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-5.c: Likewise.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* gcc.dg/vect/vect-over-widen-16.c: Likewise.
* gcc.dg/vect/vect-over-widen-17.c: Likewise.
* gcc.dg/vect/vect-over-widen-18.c: Likewise.
* gcc.dg/vect/vect-over-widen-19.c: Likewise.
* gcc.dg/vect/vect-over-widen-20.c: Likewise.
* gcc.dg/vect/vect-over-widen-21.c: Likewise.
Index: gcc/poly-int.h
===================================================================
*** gcc/poly-int.h 2018-06-20 11:36:19.000000000 +0100
--- gcc/poly-int.h 2018-06-20 11:36:20.135890693 +0100
*************** print_dec (const poly_int_pod<N, C> &val
*** 2420,2425 ****
--- 2420,2444 ----
poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
}
+ /* Use print_hex to print VALUE to FILE. */
+
+ template<unsigned int N, typename C>
+ void
+ print_hex (const poly_int_pod<N, C> &value, FILE *file)
+ {
+ if (value.is_constant ())
+ print_hex (value.coeffs[0], file);
+ else
+ {
+ fprintf (file, "[");
+ for (unsigned int i = 0; i < N; ++i)
+ {
+ print_hex (value.coeffs[i], file);
+ fputc (i == N - 1 ? ']' : ',', file);
+ }
+ }
+ }
+
/* Helper for calculating the distance between two points P1 and P2,
in cases where known_le (P1, P2). T1 and T2 are the types of the
two positions, in either order. The coefficients of P2 - P1 have
Index: gcc/dumpfile.h
===================================================================
*** gcc/dumpfile.h 2018-06-20 11:36:19.000000000 +0100
--- gcc/dumpfile.h 2018-06-20 11:36:20.131890728 +0100
*************** extern bool enable_rtl_dump_file (void);
*** 288,293 ****
--- 288,295 ----
template<unsigned int N, typename C>
void dump_dec (dump_flags_t, const poly_int<N, C> &);
+ extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
+ extern void dump_hex (dump_flags_t, const poly_wide_int &);
/* In tree-dump.c */
extern void dump_node (const_tree, dump_flags_t, FILE *);
Index: gcc/dumpfile.c
===================================================================
*** gcc/dumpfile.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/dumpfile.c 2018-06-20 11:36:20.131890728 +0100
*************** template void dump_dec (dump_flags_t, co
*** 512,517 ****
--- 512,539 ----
template void dump_dec (dump_flags_t, const poly_offset_int &);
template void dump_dec (dump_flags_t, const poly_widest_int &);
+ void
+ dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
+ {
+ if (dump_file && (dump_kind & pflags))
+ print_dec (value, dump_file, sgn);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_dec (value, alt_dump_file, sgn);
+ }
+
+ /* Output VALUE in hexadecimal to appropriate dump streams. */
+
+ void
+ dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
+ {
+ if (dump_file && (dump_kind & pflags))
+ print_hex (value, dump_file);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_hex (value, alt_dump_file);
+ }
+
/* Start a dump for PHASE. Store user-supplied dump flags in
*FLAG_PTR. Return the number of streams opened. Set globals
DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
Index: gcc/tree-vectorizer.h
===================================================================
*** gcc/tree-vectorizer.h 2018-06-20 11:36:19.000000000 +0100
--- gcc/tree-vectorizer.h 2018-06-20 11:36:20.139890658 +0100
*************** typedef struct _stmt_vec_info {
*** 872,877 ****
--- 872,892 ----
/* The number of scalar stmt references from active SLP instances. */
unsigned int num_slp_uses;
+
+ /* If nonzero, the lhs of the statement could be truncated to this
+ many bits without affecting any users of the result. */
+ unsigned int min_output_precision;
+
+ /* If nonzero, all non-boolean input operands have the same precision,
+ and they could each be truncated to this many bits without changing
+ the result. */
+ unsigned int min_input_precision;
+
+ /* If OPERATION_BITS is nonzero, the statement could be performed on
+ an integer with the sign and number of bits given by OPERATION_SIGN
+ and OPERATION_BITS without changing the result. */
+ unsigned int operation_precision;
+ signop operation_sign;
} *stmt_vec_info;
/* Information about a gather/scatter call. */
Index: gcc/tree-vect-patterns.c
===================================================================
*** gcc/tree-vect-patterns.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/tree-vect-patterns.c 2018-06-20 11:36:20.139890658 +0100
*************** Software Foundation; either version 3, o
*** 47,52 ****
--- 47,86 ----
#include "omp-simd-clone.h"
#include "predict.h"
+ /* Return true if we have a useful VR_RANGE range for VAR, storing it
+ in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
+
+ static bool
+ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
+ {
+ value_range_type vr_type = get_range_info (var, min_value, max_value);
+ wide_int nonzero = get_nonzero_bits (var);
+ signop sgn = TYPE_SIGN (TREE_TYPE (var));
+ if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
+ nonzero, sgn) == VR_RANGE)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ dump_printf (MSG_NOTE, " has range [");
+ dump_hex (MSG_NOTE, *min_value);
+ dump_printf (MSG_NOTE, ", ");
+ dump_hex (MSG_NOTE, *max_value);
+ dump_printf (MSG_NOTE, "]\n");
+ }
+ return true;
+ }
+ else
+ {
+ if (dump_enabled_p ())
+ {
+ dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ dump_printf (MSG_NOTE, " has no range info\n");
+ }
+ return false;
+ }
+ }
+
/* Report that we've found an instance of pattern PATTERN in
statement STMT. */
*************** vect_supportable_direct_optab_p (tree ot
*** 190,229 ****
return true;
}
- /* Check whether STMT2 is in the same loop or basic block as STMT1.
- Which of the two applies depends on whether we're currently doing
- loop-based or basic-block-based vectorization, as determined by
- the vinfo_for_stmt for STMT1 (which must be defined).
-
- If this returns true, vinfo_for_stmt for STMT2 is guaranteed
- to be defined as well. */
-
- static bool
- vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
- {
- stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
- return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
- }
-
- /* If the LHS of DEF_STMT has a single use, and that statement is
- in the same loop or basic block, return it. */
-
- static gimple *
- vect_single_imm_use (gimple *def_stmt)
- {
- tree lhs = gimple_assign_lhs (def_stmt);
- use_operand_p use_p;
- gimple *use_stmt;
-
- if (!single_imm_use (lhs, &use_p, &use_stmt))
- return NULL;
-
- if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
- return NULL;
-
- return use_stmt;
- }
-
/* If OP is defined by a statement that's being considered for vectorization,
return information about that statement, otherwise return NULL. */
--- 224,229 ----
*************** vect_unpromoted_value::set_op (tree op_i
*** 341,347 ****
is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its
! input in UNPROM, otherwise describe OP' in UNPROM.
A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change,
--- 341,349 ----
is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its
! input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
! is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
! have more than one user.
A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change,
*************** vect_unpromoted_value::set_op (tree op_i
*** 368,374 ****
static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! vect_unpromoted_value *unprom)
{
tree res = NULL_TREE;
tree op_type = TREE_TYPE (op);
--- 370,377 ----
static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! vect_unpromoted_value *unprom,
! bool *single_use_p = NULL)
{
tree res = NULL_TREE;
tree op_type = TREE_TYPE (op);
*************** vect_look_through_possible_promotion (ve
*** 417,422 ****
--- 420,430 ----
{
def_stmt = vect_look_through_pattern (def_stmt);
caster = vinfo_for_stmt (def_stmt);
+ /* Ignore pattern statements, since we don't link uses for them. */
+ if (single_use_p
+ && !STMT_VINFO_RELATED_STMT (caster)
+ && !has_single_use (res))
+ *single_use_p = false;
}
else
caster = NULL;
*************** vect_recog_widen_sum_pattern (vec<gimple
*** 1307,1669 ****
return pattern_stmt;
}
! /* Return TRUE if the operation in STMT can be performed on a smaller type.
!
! Input:
! STMT - a statement to check.
! DEF - we support operations with two operands, one of which is constant.
! The other operand can be defined by a demotion operation, or by a
! previous statement in a sequence of over-promoted operations. In the
! later case DEF is used to replace that operand. (It is defined by a
! pattern statement we created for the previous statement in the
! sequence).
!
! Input/output:
! NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
! NULL, it's the type of DEF.
! STMTS - additional pattern statements. If a pattern statement (type
! conversion) is created in this function, its original statement is
! added to STMTS.
! Output:
! OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
! operands to use in the new pattern statement for STMT (will be created
! in vect_recog_over_widening_pattern ()).
! NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
! statements for STMT: the first one is a type promotion and the second
! one is the operation itself. We return the type promotion statement
! in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
! the second pattern statement. */
! static bool
! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
! tree *op0, tree *op1, gimple **new_def_stmt,
! vec<gimple *> *stmts)
! {
! enum tree_code code;
! tree const_oprnd, oprnd;
! tree interm_type = NULL_TREE, half_type, new_oprnd, type;
! gimple *def_stmt, *new_stmt;
! bool first = false;
! bool promotion;
! *op0 = NULL_TREE;
! *op1 = NULL_TREE;
! *new_def_stmt = NULL;
! if (!is_gimple_assign (stmt))
! return false;
! code = gimple_assign_rhs_code (stmt);
! if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
! && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
! return false;
! oprnd = gimple_assign_rhs1 (stmt);
! const_oprnd = gimple_assign_rhs2 (stmt);
! type = gimple_expr_type (stmt);
! if (TREE_CODE (oprnd) != SSA_NAME
! || TREE_CODE (const_oprnd) != INTEGER_CST)
! return false;
! /* If oprnd has other uses besides that in stmt we cannot mark it
! as being part of a pattern only. */
! if (!has_single_use (oprnd))
! return false;
! /* If we are in the middle of a sequence, we use DEF from a previous
! statement. Otherwise, OPRND has to be a result of type promotion. */
! if (*new_type)
! {
! half_type = *new_type;
! oprnd = def;
! }
! else
{
! first = true;
! if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
! &promotion)
! || !promotion
! || !vect_same_loop_or_bb_p (stmt, def_stmt))
! return false;
}
! /* Can we perform the operation on a smaller type? */
! switch (code)
! {
! case BIT_IOR_EXPR:
! case BIT_XOR_EXPR:
! case BIT_AND_EXPR:
! if (!int_fits_type_p (const_oprnd, half_type))
! {
! /* HALF_TYPE is not enough. Try a bigger type if possible. */
! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
! return false;
!
! interm_type = build_nonstandard_integer_type (
! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
! if (!int_fits_type_p (const_oprnd, interm_type))
! return false;
! }
!
! break;
!
! case LSHIFT_EXPR:
! /* Try intermediate type - HALF_TYPE is not enough for sure. */
! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
! return false;
!
! /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
! (e.g., if the original value was char, the shift amount is at most 8
! if we want to use short). */
! if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
! return false;
!
! interm_type = build_nonstandard_integer_type (
! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
!
! if (!vect_supportable_shift (code, interm_type))
! return false;
!
! break;
!
! case RSHIFT_EXPR:
! if (vect_supportable_shift (code, half_type))
! break;
!
! /* Try intermediate type - HALF_TYPE is not supported. */
! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
! return false;
!
! interm_type = build_nonstandard_integer_type (
! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
!
! if (!vect_supportable_shift (code, interm_type))
! return false;
!
! break;
!
! default:
! gcc_unreachable ();
! }
!
! /* There are four possible cases:
! 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
! the first statement in the sequence)
! a. The original, HALF_TYPE, is not enough - we replace the promotion
! from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
! b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
! promotion.
! 2. OPRND is defined by a pattern statement we created.
! a. Its type is not sufficient for the operation, we create a new stmt:
! a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
! this statement in NEW_DEF_STMT, and it is later put in
! STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
! b. OPRND is good to use in the new statement. */
! if (first)
! {
! if (interm_type)
! {
! /* Replace the original type conversion HALF_TYPE->TYPE with
! HALF_TYPE->INTERM_TYPE. */
! if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
! {
! new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
! /* Check if the already created pattern stmt is what we need. */
! if (!is_gimple_assign (new_stmt)
! || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
! || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
! return false;
!
! stmts->safe_push (def_stmt);
! oprnd = gimple_assign_lhs (new_stmt);
! }
! else
! {
! /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
! oprnd = gimple_assign_rhs1 (def_stmt);
! new_oprnd = make_ssa_name (interm_type);
! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
! stmts->safe_push (def_stmt);
! oprnd = new_oprnd;
! }
! }
! else
! {
! /* Retrieve the operand before the type promotion. */
! oprnd = gimple_assign_rhs1 (def_stmt);
! }
! }
! else
! {
! if (interm_type)
! {
! /* Create a type conversion HALF_TYPE->INTERM_TYPE. */
! new_oprnd = make_ssa_name (interm_type);
! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
! oprnd = new_oprnd;
! *new_def_stmt = new_stmt;
! }
! /* Otherwise, OPRND is already set. */
}
! if (interm_type)
! *new_type = interm_type;
! else
! *new_type = half_type;
!
! *op0 = oprnd;
! *op1 = fold_convert (*new_type, const_oprnd);
! return true;
}
! /* Try to find a statement or a sequence of statements that can be performed
! on a smaller type:
! type x_t;
! TYPE x_T, res0_T, res1_T;
! loop:
! S1 x_t = *p;
! S2 x_T = (TYPE) x_t;
! S3 res0_T = op (x_T, C0);
! S4 res1_T = op (res0_T, C1);
! S5 ... = () res1_T; - type demotion
!
! where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
! constants.
! Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
! be 'type' or some intermediate type. For now, we expect S5 to be a type
! demotion operation. We also check that S3 and S4 have only one use. */
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
! gimple *stmt = stmts->pop ();
! gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
! *use_stmt = NULL;
! tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
! tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
! bool first;
! tree type = NULL;
!
! first = true;
! while (1)
! {
! if (!vinfo_for_stmt (stmt)
! || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
! return NULL;
!
! new_def_stmt = NULL;
! if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
! &op0, &op1, &new_def_stmt,
! stmts))
! {
! if (first)
! return NULL;
! else
! break;
! }
! /* STMT can be performed on a smaller type. Check its uses. */
! use_stmt = vect_single_imm_use (stmt);
! if (!use_stmt || !is_gimple_assign (use_stmt))
! return NULL;
!
! /* Create pattern statement for STMT. */
! vectype = get_vectype_for_scalar_type (new_type);
! if (!vectype)
! return NULL;
!
! /* We want to collect all the statements for which we create pattern
! statetments, except for the case when the last statement in the
! sequence doesn't have a corresponding pattern statement. In such
! case we associate the last pattern statement with the last statement
! in the sequence. Therefore, we only add the original statement to
! the list if we know that it is not the last. */
! if (prev_stmt)
! stmts->safe_push (prev_stmt);
! var = vect_recog_temp_ssa_var (new_type, NULL);
! pattern_stmt
! = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
! new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
! if (dump_enabled_p ())
! {
! dump_printf_loc (MSG_NOTE, vect_location,
! "created pattern stmt: ");
! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
! }
! type = gimple_expr_type (stmt);
! prev_stmt = stmt;
! stmt = use_stmt;
!
! first = false;
! }
!
! /* We got a sequence. We expect it to end with a type demotion operation.
! Otherwise, we quit (for now). There are three possible cases: the
! conversion is to NEW_TYPE (we don't do anything), the conversion is to
! a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
! NEW_TYPE differs (we create a new conversion statement). */
! if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
! {
! use_lhs = gimple_assign_lhs (use_stmt);
! use_type = TREE_TYPE (use_lhs);
! /* Support only type demotion or signedess change. */
! if (!INTEGRAL_TYPE_P (use_type)
! || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
! return NULL;
! /* Check that NEW_TYPE is not bigger than the conversion result. */
! if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
! return NULL;
! if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
! || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
! {
! *type_out = get_vectype_for_scalar_type (use_type);
! if (!*type_out)
! return NULL;
! /* Create NEW_TYPE->USE_TYPE conversion. */
! new_oprnd = make_ssa_name (use_type);
! pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
!
! /* We created a pattern statement for the last statement in the
! sequence, so we don't need to associate it with the pattern
! statement created for PREV_STMT. Therefore, we add PREV_STMT
! to the list in order to mark it later in vect_pattern_recog_1. */
! if (prev_stmt)
! stmts->safe_push (prev_stmt);
! }
! else
! {
! if (prev_stmt)
! STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
! = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
! *type_out = vectype;
! }
! stmts->safe_push (use_stmt);
! }
! else
! /* TODO: support general case, create a conversion to the correct type. */
return NULL;
! /* Pattern detected. */
! vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
return pattern_stmt;
}
--- 1315,1632 ----
return pattern_stmt;
}
+ /* Recognize cases in which an operation is performed in one type WTYPE
+ but could be done more efficiently in a narrower type NTYPE. For example,
+ if we have:
+
+ ATYPE a; // narrower than NTYPE
+ BTYPE b; // narrower than NTYPE
+ WTYPE aw = (WTYPE) a;
+ WTYPE bw = (WTYPE) b;
+ WTYPE res = aw + bw; // only uses of aw and bw
+
+ then it would be more efficient to do:
+
+ NTYPE an = (NTYPE) a;
+ NTYPE bn = (NTYPE) b;
+ NTYPE resn = an + bn;
+ WTYPE res = (WTYPE) resn;
+
+ Other situations include things like:
+
+ ATYPE a; // NTYPE or narrower
+ WTYPE aw = (WTYPE) a;
+ WTYPE res = aw + b;
+
+ when only "(NTYPE) res" is significant. In that case it's more efficient
+ to truncate "b" and do the operation on NTYPE instead:
+
+ NTYPE an = (NTYPE) a;
+ NTYPE bn = (NTYPE) b; // truncation
+ NTYPE resn = an + bn;
+ WTYPE res = (WTYPE) resn;
+
+ All users of "res" should then use "resn" instead, making the final
+ statement dead (not marked as relevant). The final statement is still
+ needed to maintain the type correctness of the IR.
+
+ vect_determine_precisions has already determined the minimum
+ precison of the operation and the minimum precision required
+ by users of the result. */
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
! if (!last_stmt)
! return NULL;
! /* See whether we have found that this operation can be done on a
! narrower type without changing its semantics. */
! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
! unsigned int new_precision = last_stmt_info->operation_precision;
! if (!new_precision)
! return NULL;
! vec_info *vinfo = last_stmt_info->vinfo;
! tree lhs = gimple_assign_lhs (last_stmt);
! tree type = TREE_TYPE (lhs);
! tree_code code = gimple_assign_rhs_code (last_stmt);
!
! /* Keep the first operand of a COND_EXPR as-is: only the other two
! operands are interesting. */
! unsigned int first_op = (code == COND_EXPR ? 2 : 1);
!
! /* Check the operands. */
! unsigned int nops = gimple_num_ops (last_stmt) - first_op;
! auto_vec <vect_unpromoted_value, 3> unprom (nops);
! unprom.quick_grow (nops);
! unsigned int min_precision = 0;
! bool single_use_p = false;
! for (unsigned int i = 0; i < nops; ++i)
! {
! tree op = gimple_op (last_stmt, first_op + i);
! if (TREE_CODE (op) == INTEGER_CST)
! unprom[i].set_op (op, vect_constant_def);
! else if (TREE_CODE (op) == SSA_NAME)
! {
! bool op_single_use_p = true;
! if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
! &op_single_use_p))
! return NULL;
! /* If:
! (1) N bits of the result are needed;
! (2) all inputs are widened from M<N bits; and
! (3) one operand OP is a single-use SSA name
!
! we can shift the M->N widening from OP to the output
! without changing the number or type of extensions involved.
! This then reduces the number of copies of STMT_INFO.
!
! If instead of (3) more than one operand is a single-use SSA name,
! shifting the extension to the output is even more of a win.
!
! If instead:
!
! (1) N bits of the result are needed;
! (2) one operand OP2 is widened from M2<N bits;
! (3) another operand OP1 is widened from M1<M2 bits; and
! (4) both OP1 and OP2 are single-use
!
! the choice is between:
!
! (a) truncating OP2 to M1, doing the operation on M1,
! and then widening the result to N
!
! (b) widening OP1 to M2, doing the operation on M2, and then
! widening the result to N
!
! Both shift the M2->N widening of the inputs to the output.
! (a) additionally shifts the M1->M2 widening to the output;
! it requires fewer copies of STMT_INFO but requires an extra
! M2->M1 truncation.
!
! Which is better will depend on the complexity and cost of
! STMT_INFO, which is hard to predict at this stage. However,
! a clear tie-breaker in favor of (b) is the fact that the
! truncation in (a) increases the length of the operation chain.
!
! If instead of (4) only one of OP1 or OP2 is single-use,
! (b) is still a win over doing the operation in N bits:
! it still shifts the M2->N widening on the single-use operand
! to the output and reduces the number of STMT_INFO copies.
!
! If neither operand is single-use then operating on fewer than
! N bits might lead to more extensions overall. Whether it does
! or not depends on global information about the vectorization
! region, and whether that's a good trade-off would again
! depend on the complexity and cost of the statements involved,
! as well as things like register pressure that are not normally
! modelled at this stage. We therefore ignore these cases
! and just optimize the clear single-use wins above.
!
! Thus we take the maximum precision of the unpromoted operands
! and record whether any operand is single-use. */
! if (unprom[i].dt == vect_internal_def)
! {
! min_precision = MAX (min_precision,
! TYPE_PRECISION (unprom[i].type));
! single_use_p |= op_single_use_p;
! }
! }
! }
! /* Although the operation could be done in operation_precision, we have
! to balance that against introducing extra truncations or extensions.
! Calculate the minimum precision that can be handled efficiently.
!
! The loop above determined that the operation could be handled
! efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
! extension from the inputs to the output without introducing more
! instructions, and would reduce the number of instructions required
! for STMT_INFO itself.
!
! vect_determine_precisions has also determined that the result only
! needs min_output_precision bits. Truncating by a factor of N times
! requires a tree of N - 1 instructions, so if TYPE is N times wider
! than min_output_precision, doing the operation in TYPE and truncating
! the result requires N + (N - 1) = 2N - 1 instructions per output vector.
! In contrast:
!
! - truncating the input to a unary operation and doing the operation
! in the new type requires at most N - 1 + 1 = N instructions per
! output vector
!
! - doing the same for a binary operation requires at most
! (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
!
! Both unary and binary operations require fewer instructions than
! this if the operands were extended from a suitable truncated form.
! Thus there is usually nothing to lose by doing operations in
! min_output_precision bits, but there can be something to gain. */
! if (!single_use_p)
! min_precision = last_stmt_info->min_output_precision;
! else
! min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
! /* Apply the minimum efficient precision we just calculated. */
! if (new_precision < min_precision)
! new_precision = min_precision;
! if (new_precision >= TYPE_PRECISION (type))
! return NULL;
! vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
! *type_out = get_vectype_for_scalar_type (type);
! if (!*type_out)
! return NULL;
! /* We've found a viable pattern. Get the new type of the operation. */
! bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
! tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
!
! /* We specifically don't check here whether the target supports the
! new operation, since it might be something that a later pattern
! wants to rewrite anyway. If targets have a minimum element size
! for some optabs, we should pattern-match smaller ops to larger ops
! where beneficial. */
! tree new_vectype = get_vectype_for_scalar_type (new_type);
! if (!new_vectype)
! return NULL;
! if (dump_enabled_p ())
{
! dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
! dump_printf (MSG_NOTE, " to ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
! dump_printf (MSG_NOTE, "\n");
}
! /* Calculate the rhs operands for an operation on NEW_TYPE. */
! STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
! tree ops[3] = {};
! for (unsigned int i = 1; i < first_op; ++i)
! ops[i - 1] = gimple_op (last_stmt, i);
! vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
! new_type, &unprom[0], new_vectype);
!
! /* Use the operation to produce a result of type NEW_TYPE. */
! tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
! gimple *pattern_stmt = gimple_build_assign (new_var, code,
! ops[0], ops[1], ops[2]);
! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
! if (dump_enabled_p ())
! {
! dump_printf_loc (MSG_NOTE, vect_location,
! "created pattern stmt: ");
! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
}
! pattern_stmt = vect_convert_output (last_stmt_info, type,
! pattern_stmt, new_vectype);
! stmts->safe_push (last_stmt);
! return pattern_stmt;
}
+ /* Recognize cases in which the input to a cast is wider than its
+ output, and the input is fed by a widening operation. Fold this
+ by removing the unnecessary intermediate widening. E.g.:
! unsigned char a;
! unsigned int b = (unsigned int) a;
! unsigned short c = (unsigned short) b;
! -->
! unsigned short c = (unsigned short) a;
! Although this is rare in input IR, it is an expected side-effect
! of the over-widening pattern above.
! This is beneficial also for integer-to-float conversions, if the
! widened integer has more bits than the float, and if the unwidened
! input doesn't. */
! static gimple *
! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
! {
! /* Check for a cast, including an integer-to-float conversion. */
! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
! if (!last_stmt)
! return NULL;
! tree_code code = gimple_assign_rhs_code (last_stmt);
! if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
! return NULL;
! /* Make sure that the rhs is a scalar with a natural bitsize. */
! tree lhs = gimple_assign_lhs (last_stmt);
! if (!lhs)
! return NULL;
! tree lhs_type = TREE_TYPE (lhs);
! scalar_mode lhs_mode;
! if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
! || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
! return NULL;
! /* Check for a narrowing operation (from a vector point of view). */
! tree rhs = gimple_assign_rhs1 (last_stmt);
! tree rhs_type = TREE_TYPE (rhs);
! if (!INTEGRAL_TYPE_P (rhs_type)
! || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
! || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
! return NULL;
! /* Try to find an unpromoted input. */
! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
! vec_info *vinfo = last_stmt_info->vinfo;
! vect_unpromoted_value unprom;
! if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
! || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
! return NULL;
! /* If the bits above RHS_TYPE matter, make sure that they're the
! same when extending from UNPROM as they are when extending from RHS. */
! if (!INTEGRAL_TYPE_P (lhs_type)
! && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
! return NULL;
! /* We can get the same result by casting UNPROM directly, to avoid
! the unnecessary widening and narrowing. */
! vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
! *type_out = get_vectype_for_scalar_type (lhs_type);
! if (!*type_out)
return NULL;
! tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
! gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
+ stmts->safe_push (last_stmt);
return pattern_stmt;
}
*************** vect_recog_gather_scatter_pattern (vec<g
*** 4145,4150 ****
--- 4108,4498 ----
return pattern_stmt;
}
+ /* Return true if TYPE is a non-boolean integer type. These are the types
+ that we want to consider for narrowing. */
+
+ static bool
+ vect_narrowable_type_p (tree type)
+ {
+ return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
+ }
+
+ /* Return true if the operation given by CODE can be truncated to N bits
+ when only N bits of the output are needed. This is only true if bit N+1
+ of the inputs has no effect on the low N bits of the result. */
+
+ static bool
+ vect_truncatable_operation_p (tree_code code)
+ {
+ switch (code)
+ {
+ case PLUS_EXPR:
+ case MINUS_EXPR:
+ case MULT_EXPR:
+ case BIT_AND_EXPR:
+ case BIT_IOR_EXPR:
+ case BIT_XOR_EXPR:
+ case COND_EXPR:
+ return true;
+
+ default:
+ return false;
+ }
+ }
+
+ /* Record that STMT_INFO could be changed from operating on TYPE to
+ operating on a type with the precision and sign given by PRECISION
+ and SIGN respectively. PRECISION is an arbitrary bit precision;
+ it might not be a whole number of bytes. */
+
+ static void
+ vect_set_operation_type (stmt_vec_info stmt_info, tree type,
+ unsigned int precision, signop sign)
+ {
+ /* Round the precision up to a whole number of bytes. */
+ precision = 1 << ceil_log2 (precision);
+ precision = MAX (precision, BITS_PER_UNIT);
+ if (precision < TYPE_PRECISION (type)
+ && (!stmt_info->operation_precision
+ || stmt_info->operation_precision > precision))
+ {
+ stmt_info->operation_precision = precision;
+ stmt_info->operation_sign = sign;
+ }
+ }
+
+ /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
+ non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
+ is an arbitrary bit precision; it might not be a whole number of bytes. */
+
+ static void
+ vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
+ unsigned int min_input_precision)
+ {
+ /* This operation in isolation only requires the inputs to have
+ MIN_INPUT_PRECISION of precision, However, that doesn't mean
+ that MIN_INPUT_PRECISION is a natural precision for the chain
+ as a whole. E.g. consider something like:
+
+ unsigned short *x, *y;
+ *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+ The right shift can be done on unsigned chars, and only requires the
+ result of "*x & 0xf0" to be done on unsigned chars. But taking that
+ approach would mean turning a natural chain of single-vector unsigned
+ short operations into one that truncates "*x" and then extends
+ "(*x & 0xf0) >> 4", with two vectors for each unsigned short
+ operation and one vector for each unsigned char operation.
+ This would be a significant pessimization.
+
+ Instead only propagate the maximum of this precision and the precision
+ required by the users of the result. This means that we don't pessimize
+ the case above but continue to optimize things like:
+
+ unsigned char *y;
+ unsigned short *x;
+ *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+ Here we would truncate two vectors of *x to a single vector of
+ unsigned chars and use single-vector unsigned char operations for
+ everything else, rather than doing two unsigned short copies of
+ "(*x & 0xf0) >> 4" and then truncating the result. */
+ min_input_precision = MAX (min_input_precision,
+ stmt_info->min_output_precision);
+
+ if (min_input_precision < TYPE_PRECISION (type)
+ && (!stmt_info->min_input_precision
+ || stmt_info->min_input_precision > min_input_precision))
+ stmt_info->min_input_precision = min_input_precision;
+ }
+
+ /* Subroutine of vect_determine_min_output_precision. Return true if
+ we can calculate a reduced number of output bits for STMT_INFO,
+ whose result is LHS. */
+
+ static bool
+ vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
+ {
+ /* Take the maximum precision required by users of the result. */
+ unsigned int precision = 0;
+ imm_use_iterator iter;
+ use_operand_p use;
+ FOR_EACH_IMM_USE_FAST (use, iter, lhs)
+ {
+ gimple *use_stmt = USE_STMT (use);
+ if (is_gimple_debug (use_stmt))
+ continue;
+ if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
+ return false;
+ stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
+ if (!use_stmt_info->min_input_precision)
+ return false;
+ precision = MAX (precision, use_stmt_info->min_input_precision);
+ }
+
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
+ precision);
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
+ dump_printf (MSG_NOTE, " are significant\n");
+ }
+ stmt_info->min_output_precision = precision;
+ return true;
+ }
+
+ /* Calculate min_output_precision for STMT_INFO. */
+
+ static void
+ vect_determine_min_output_precision (stmt_vec_info stmt_info)
+ {
+ /* We're only interested in statements with a narrowable result. */
+ tree lhs = gimple_get_lhs (stmt_info->stmt);
+ if (!lhs
+ || TREE_CODE (lhs) != SSA_NAME
+ || !vect_narrowable_type_p (TREE_TYPE (lhs)))
+ return;
+
+ if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
+ stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
+ }
+
+ /* Use range information to decide whether STMT (described by STMT_INFO)
+ could be done in a narrower type. This is effectively a forward
+ propagation, since it uses context-independent information that applies
+ to all users of an SSA name. */
+
+ static void
+ vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
+ {
+ tree lhs = gimple_assign_lhs (stmt);
+ if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+ return;
+
+ tree type = TREE_TYPE (lhs);
+ if (!vect_narrowable_type_p (type))
+ return;
+
+ /* First see whether we have any useful range information for the result. */
+ unsigned int precision = TYPE_PRECISION (type);
+ signop sign = TYPE_SIGN (type);
+ wide_int min_value, max_value;
+ if (!vect_get_range_info (lhs, &min_value, &max_value))
+ return;
+
+ tree_code code = gimple_assign_rhs_code (stmt);
+ unsigned int nops = gimple_num_ops (stmt);
+
+ if (!vect_truncatable_operation_p (code))
+ /* Check that all relevant input operands are compatible, and update
+ [MIN_VALUE, MAX_VALUE] to include their ranges. */
+ for (unsigned int i = 1; i < nops; ++i)
+ {
+ tree op = gimple_op (stmt, i);
+ if (TREE_CODE (op) == INTEGER_CST)
+ {
+ /* Don't require the integer to have RHS_TYPE (which it might
+ not for things like shift amounts, etc.), but do require it
+ to fit the type. */
+ if (!int_fits_type_p (op, type))
+ return;
+
+ min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
+ max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
+ }
+ else if (TREE_CODE (op) == SSA_NAME)
+ {
+ /* Ignore codes that don't take uniform arguments. */
+ if (!types_compatible_p (TREE_TYPE (op), type))
+ return;
+
+ wide_int op_min_value, op_max_value;
+ if (!vect_get_range_info (op, &op_min_value, &op_max_value))
+ return;
+
+ min_value = wi::min (min_value, op_min_value, sign);
+ max_value = wi::max (max_value, op_max_value, sign);
+ }
+ else
+ return;
+ }
+
+ /* Try to switch signed types for unsigned types if we can.
+ This is better for two reasons. First, unsigned ops tend
+ to be cheaper than signed ops. Second, it means that we can
+ handle things like:
+
+ signed char c;
+ int res = (int) c & 0xff00; // range [0x0000, 0xff00]
+
+ as:
+
+ signed char c;
+ unsigned short res_1 = (unsigned short) c & 0xff00;
+ int res = (int) res_1;
+
+ where the intermediate result res_1 has unsigned rather than
+ signed type. */
+ if (sign == SIGNED && !wi::neg_p (min_value))
+ sign = UNSIGNED;
+
+ /* See what precision is required for MIN_VALUE and MAX_VALUE. */
+ unsigned int precision1 = wi::min_precision (min_value, sign);
+ unsigned int precision2 = wi::min_precision (max_value, sign);
+ unsigned int value_precision = MAX (precision1, precision2);
+ if (value_precision >= precision)
+ return;
+
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ " without loss of precision: ",
+ sign == SIGNED ? "signed" : "unsigned",
+ value_precision);
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ }
+
+ vect_set_operation_type (stmt_info, type, value_precision, sign);
+ vect_set_min_input_precision (stmt_info, type, value_precision);
+ }
+
+ /* Use information about the users of STMT's result to decide whether
+ STMT (described by STMT_INFO) could be done in a narrower type.
+ This is effectively a backward propagation. */
+
+ static void
+ vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
+ {
+ tree_code code = gimple_assign_rhs_code (stmt);
+ unsigned int opno = (code == COND_EXPR ? 2 : 1);
+ tree type = TREE_TYPE (gimple_op (stmt, opno));
+ if (!vect_narrowable_type_p (type))
+ return;
+
+ unsigned int precision = TYPE_PRECISION (type);
+ unsigned int operation_precision, min_input_precision;
+ switch (code)
+ {
+ CASE_CONVERT:
+ /* Only the bits that contribute to the output matter. Don't change
+ the precision of the operation itself. */
+ operation_precision = precision;
+ min_input_precision = stmt_info->min_output_precision;
+ break;
+
+ case LSHIFT_EXPR:
+ case RSHIFT_EXPR:
+ {
+ tree shift = gimple_assign_rhs2 (stmt);
+ if (TREE_CODE (shift) != INTEGER_CST
+ || !wi::ltu_p (wi::to_widest (shift), precision))
+ return;
+ unsigned int const_shift = TREE_INT_CST_LOW (shift);
+ if (code == LSHIFT_EXPR)
+ {
+ /* We need CONST_SHIFT fewer bits of the input. */
+ operation_precision = stmt_info->min_output_precision;
+ min_input_precision = (MAX (operation_precision, const_shift)
+ - const_shift);
+ }
+ else
+ {
+ /* We need CONST_SHIFT extra bits to do the operation. */
+ operation_precision = (stmt_info->min_output_precision
+ + const_shift);
+ min_input_precision = operation_precision;
+ }
+ break;
+ }
+
+ default:
+ if (vect_truncatable_operation_p (code))
+ {
+ /* Input bit N has no effect on output bits N-1 and lower. */
+ operation_precision = stmt_info->min_output_precision;
+ min_input_precision = operation_precision;
+ break;
+ }
+ return;
+ }
+
+ if (operation_precision < precision)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ " without affecting users: ",
+ TYPE_UNSIGNED (type) ? "unsigned" : "signed",
+ operation_precision);
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ }
+ vect_set_operation_type (stmt_info, type, operation_precision,
+ TYPE_SIGN (type));
+ }
+ vect_set_min_input_precision (stmt_info, type, min_input_precision);
+ }
+
+ /* Handle vect_determine_precisions for STMT_INFO, given that we
+ have already done so for the users of its result. */
+
+ void
+ vect_determine_stmt_precisions (stmt_vec_info stmt_info)
+ {
+ vect_determine_min_output_precision (stmt_info);
+ if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
+ {
+ vect_determine_precisions_from_range (stmt_info, stmt);
+ vect_determine_precisions_from_users (stmt_info, stmt);
+ }
+ }
+
+ /* Walk backwards through the vectorizable region to determine the
+ values of these fields:
+
+ - min_output_precision
+ - min_input_precision
+ - operation_precision
+ - operation_sign. */
+
+ void
+ vect_determine_precisions (vec_info *vinfo)
+ {
+ DUMP_VECT_SCOPE ("vect_determine_precisions");
+
+ if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+ unsigned int nbbs = loop->num_nodes;
+
+ for (unsigned int i = 0; i < nbbs; i++)
+ {
+ basic_block bb = bbs[nbbs - i - 1];
+ for (gimple_stmt_iterator si = gsi_last_bb (bb);
+ !gsi_end_p (si); gsi_prev (&si))
+ vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
+ }
+ }
+ else
+ {
+ bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+ gimple_stmt_iterator si = bb_vinfo->region_end;
+ gimple *stmt;
+ do
+ {
+ if (!gsi_stmt (si))
+ si = gsi_last_bb (bb_vinfo->bb);
+ else
+ gsi_prev (&si);
+ stmt = gsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+ vect_determine_stmt_precisions (stmt_info);
+ }
+ while (stmt != gsi_stmt (bb_vinfo->region_begin));
+ }
+ }
+
typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
struct vect_recog_func
*************** struct vect_recog_func
*** 4157,4169 ****
taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = {
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
- { vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" },
--- 4505,4518 ----
taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = {
+ { vect_recog_over_widening_pattern, "over_widening" },
+ { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" },
*************** vect_pattern_recog (vec_info *vinfo)
*** 4437,4442 ****
--- 4786,4793 ----
unsigned int i, j;
auto_vec<gimple *, 1> stmts_to_replace;
+ vect_determine_precisions (vinfo);
+
DUMP_VECT_SCOPE ("vect_pattern_recog");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 62,69 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 62,70 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 58,64 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 58,66 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 57,63 ****
return 0;
}
! /* Final value stays in int, so no over-widening is detected at the moment. */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 57,68 ----
return 0;
}
! /* This is an over-widening even though the final result is still an int.
! It's better to do one vector of ops on chars and then widen than to
! widen and then do 4 vectors of ops on ints. */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 57,63 ****
return 0;
}
! /* Final value stays in int, so no over-widening is detected at the moment. */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 57,68 ----
return 0;
}
! /* This is an over-widening even though the final result is still an int.
! It's better to do one vector of ops on chars and then widen than to
! widen and then do 4 vectors of ops on ints. */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 57,62 ****
return 0;
}
! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 57,65 ----
return 0;
}
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 59,65 ****
return 0;
}
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 59,67 ----
return 0;
}
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 66,73 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 66,74 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-06-20 11:36:19.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-06-20 11:36:20.135890693 +0100
*************** int main (void)
*** 62,68 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 62,70 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,66 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ /* Deliberate use of signed >>. */
+ #define DEF_LOOP(SIGNEDNESS) \
+ void __attribute__ ((noipa)) \
+ f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
+ SIGNEDNESS char *restrict b, \
+ SIGNEDNESS char *restrict c) \
+ { \
+ a[0] = (b[0] + c[0]) >> 1; \
+ a[1] = (b[1] + c[1]) >> 1; \
+ a[2] = (b[2] + c[2]) >> 1; \
+ a[3] = (b[3] + c[3]) >> 1; \
+ a[4] = (b[4] + c[4]) >> 1; \
+ a[5] = (b[5] + c[5]) >> 1; \
+ a[6] = (b[6] + c[6]) >> 1; \
+ a[7] = (b[7] + c[7]) >> 1; \
+ a[8] = (b[8] + c[8]) >> 1; \
+ a[9] = (b[9] + c[9]) >> 1; \
+ a[10] = (b[10] + c[10]) >> 1; \
+ a[11] = (b[11] + c[11]) >> 1; \
+ a[12] = (b[12] + c[12]) >> 1; \
+ a[13] = (b[13] + c[13]) >> 1; \
+ a[14] = (b[14] + c[14]) >> 1; \
+ a[15] = (b[15] + c[15]) >> 1; \
+ }
+
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+
+ #define N 16
+
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \
+ { \
+ SIGNEDNESS char a[N], b[N], c[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ b[i] = BASE_B + i * 15; \
+ c[i] = BASE_C + i * 14; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ f_##SIGNEDNESS (a, b, c); \
+ for (int i = 0; i < N; ++i) \
+ if (a[i] != (BASE_B + BASE_C + i * 29) >> 1) \
+ __builtin_abort (); \
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ TEST_LOOP (signed, -128, -120);
+ TEST_LOOP (unsigned, 4, 10);
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,65 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ /* Deliberate use of signed >>. */
+ #define DEF_LOOP(SIGNEDNESS) \
+ void __attribute__ ((noipa)) \
+ f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
+ SIGNEDNESS char *restrict b, \
+ SIGNEDNESS char c) \
+ { \
+ a[0] = (b[0] + c) >> 1; \
+ a[1] = (b[1] + c) >> 1; \
+ a[2] = (b[2] + c) >> 1; \
+ a[3] = (b[3] + c) >> 1; \
+ a[4] = (b[4] + c) >> 1; \
+ a[5] = (b[5] + c) >> 1; \
+ a[6] = (b[6] + c) >> 1; \
+ a[7] = (b[7] + c) >> 1; \
+ a[8] = (b[8] + c) >> 1; \
+ a[9] = (b[9] + c) >> 1; \
+ a[10] = (b[10] + c) >> 1; \
+ a[11] = (b[11] + c) >> 1; \
+ a[12] = (b[12] + c) >> 1; \
+ a[13] = (b[13] + c) >> 1; \
+ a[14] = (b[14] + c) >> 1; \
+ a[15] = (b[15] + c) >> 1; \
+ }
+
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+
+ #define N 16
+
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, C) \
+ { \
+ SIGNEDNESS char a[N], b[N], c[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ b[i] = BASE_B + i * 15; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ f_##SIGNEDNESS (a, b, C); \
+ for (int i = 0; i < N; ++i) \
+ if (a[i] != (BASE_B + C + i * 15) >> 1) \
+ __builtin_abort (); \
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ TEST_LOOP (signed, -128, -120);
+ TEST_LOOP (unsigned, 4, 250);
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ /* Deliberate use of signed >>. */
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) >> 1;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c 2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+
+ #include "vect-over-widen-5.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c 2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #define D -120
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c, SIGNEDNESS char d)
+ {
+ int promoted_d = d;
+ for (int i = 0; i < N; ++i)
+ /* Deliberate use of signed >>. */
+ a[i] = (b[i] + c[i] + promoted_d) >> 2;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, D);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c 2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #define D 251
+ #endif
+
+ #include "vect-over-widen-7.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c 2018-06-20 11:36:20.139890658 +0100
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ for (int i = 0; i < N; ++i)
+ {
+ /* Deliberate use of signed >>. */
+ int res = b[i] + c[i];
+ a[i] = (res + (res >> 1)) >> 2;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ {
+ int res = BASE_B + BASE_C + i * 9;
+ if (a[i] != ((res + (res >> 1)) >> 2))
+ __builtin_abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-9.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,63 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short, with "res"
+ being extended for the store to d[i]. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c, int *restrict d)
+ {
+ for (int i = 0; i < N; ++i)
+ {
+ /* Deliberate use of signed >>. */
+ int res = b[i] + c[i];
+ a[i] = (res + (res >> 1)) >> 2;
+ d[i] = res;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ int d[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d);
+ for (int i = 0; i < N; ++i)
+ {
+ int res = BASE_B + BASE_C + i * 9;
+ if (a[i] != ((res + (res >> 1)) >> 2))
+ __builtin_abort ();
+ if (d[i] != res)
+ __builtin_abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-11.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+
+ #define N 50
+
+ /* We rely on range analysis to show that these calculations can be done
+ in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) / 2;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-13.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+
+ #define N 50
+
+ /* We rely on range analysis to show that these calculations can be done
+ in SIGNEDNESS short, with the result being extended to int for the
+ store. */
+ void __attribute__ ((noipa))
+ f (int *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) / 2;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ int a[N];
+ SIGNEDNESS char b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-15.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,46 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 1024
+
+ /* This should not be treated as an over-widening pattern, even though
+ "(b[i] & 0xef) | 0x80)" could be done in unsigned chars. */
+
+ void __attribute__ ((noipa))
+ f (unsigned short *restrict a, unsigned short *restrict b)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+ a[i] = foo;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned short a[N], b[N];
+ for (int i = 0; i < N; ++i)
+ {
+ a[i] = i;
+ b[i] = i * 3;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 1024
+
+ /* This should be treated as an over-widening pattern: we can truncate
+ b to unsigned char after loading it and do all the computation in
+ unsigned char. */
+
+ void __attribute__ ((noipa))
+ f (unsigned char *restrict a, unsigned short *restrict b)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+ a[i] = foo;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned char a[N];
+ unsigned short b[N];
+ for (int i = 0; i < N; ++i)
+ {
+ a[i] = i;
+ b[i] = i * 3;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 111
+
+ /* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned int di = d[i];
+ unsigned int ei = e[i];
+ a[i] = di;
+ b[i] = ei;
+ c[i] = di + ei;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 111
+
+ /* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ int di = d[i];
+ int ei = e[i];
+ a[i] = di;
+ b[i] = ei;
+ c[i] = di + ei;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c 2018-06-20 11:36:20.135890693 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 111
+
+ /* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ a[i] = d[i];
+ b[i] = e[i];
+ c[i] = d[i] + e[i];
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-06-20 10:37 [14/n] PR85694: Rework overwidening detection Richard Sandiford
@ 2018-06-29 12:56 ` Richard Sandiford
2018-07-02 11:02 ` Christophe Lyon
2018-07-02 13:12 ` Richard Biener
0 siblings, 2 replies; 10+ messages in thread
From: Richard Sandiford @ 2018-06-29 12:56 UTC (permalink / raw)
To: gcc-patches
Richard Sandiford <richard.sandiford@arm.com> writes:
> This patch is the main part of PR85694. The aim is to recognise at least:
>
> signed char *a, *b, *c;
> ...
> for (int i = 0; i < 2048; i++)
> c[i] = (a[i] + b[i]) >> 1;
>
> as an over-widening pattern, since the addition and shift can be done
> on shorts rather than ints. However, it ended up being a lot more
> general than that.
>
> The current over-widening pattern detection is limited to a few simple
> cases: logical ops with immediate second operands, and shifts by a
> constant. These cases are enough for common pixel-format conversion
> and can be detected in a peephole way.
>
> The loop above requires two generalisations of the current code: support
> for addition as well as logical ops, and support for non-constant second
> operands. These are harder to detect in the same peephole way, so the
> patch tries to take a more global approach.
>
> The idea is to get information about the minimum operation width
> in two ways:
>
> (1) by using the range information attached to the SSA_NAMEs
> (effectively a forward walk, since the range info is
> context-independent).
>
> (2) by back-propagating the number of output bits required by
> users of the result.
>
> As explained in the comments, there's a balance to be struck between
> narrowing an individual operation and fitting in with the surrounding
> code. The approach is pretty conservative: if we could narrow an
> operation to N bits without changing its semantics, it's OK to do that if:
>
> - no operations later in the chain require more than N bits; or
>
> - all internally-defined inputs are extended from N bits or fewer,
> and at least one of them is single-use.
>
> See the comments for the rationale.
>
> I didn't bother adding STMT_VINFO_* wrappers for the new fields
> since the code seemed more readable without.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Here's a version rebased on top of current trunk. Changes from last time:
- reintroduce dump_generic_expr_loc, with the obvious change to the
prototype
- fix a typo in a comment
- use vect_element_precision from the new version of 12/n.
Tested as before. OK to install?
Richard
2018-06-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
* dumpfile.c (dump_generic_expr): Fix formatting.
(dump_generic_expr_loc): New function.
(dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration. Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.
gcc/testsuite/
* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
over-widening messages.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-5.c: Likewise.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* gcc.dg/vect/vect-over-widen-16.c: Likewise.
* gcc.dg/vect/vect-over-widen-17.c: Likewise.
* gcc.dg/vect/vect-over-widen-18.c: Likewise.
* gcc.dg/vect/vect-over-widen-19.c: Likewise.
* gcc.dg/vect/vect-over-widen-20.c: Likewise.
* gcc.dg/vect/vect-over-widen-21.c: Likewise.
Index: gcc/poly-int.h
===================================================================
*** gcc/poly-int.h 2018-06-29 12:33:06.000000000 +0100
--- gcc/poly-int.h 2018-06-29 12:33:06.721263572 +0100
*************** print_dec (const poly_int_pod<N, C> &val
*** 2420,2425 ****
--- 2420,2444 ----
poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
}
+ /* Use print_hex to print VALUE to FILE. */
+
+ template<unsigned int N, typename C>
+ void
+ print_hex (const poly_int_pod<N, C> &value, FILE *file)
+ {
+ if (value.is_constant ())
+ print_hex (value.coeffs[0], file);
+ else
+ {
+ fprintf (file, "[");
+ for (unsigned int i = 0; i < N; ++i)
+ {
+ print_hex (value.coeffs[i], file);
+ fputc (i == N - 1 ? ']' : ',', file);
+ }
+ }
+ }
+
/* Helper for calculating the distance between two points P1 and P2,
in cases where known_le (P1, P2). T1 and T2 are the types of the
two positions, in either order. The coefficients of P2 - P1 have
Index: gcc/dumpfile.h
===================================================================
*** gcc/dumpfile.h 2018-06-29 12:33:06.000000000 +0100
--- gcc/dumpfile.h 2018-06-29 12:33:06.717263602 +0100
*************** extern void dump_printf_loc (dump_flags_
*** 425,430 ****
--- 425,432 ----
const char *, ...) ATTRIBUTE_PRINTF_3;
extern void dump_function (int phase, tree fn);
extern void dump_basic_block (dump_flags_t, basic_block, int);
+ extern void dump_generic_expr_loc (dump_flags_t, const dump_location_t &,
+ dump_flags_t, tree);
extern void dump_generic_expr (dump_flags_t, dump_flags_t, tree);
extern void dump_gimple_stmt_loc (dump_flags_t, const dump_location_t &,
dump_flags_t, gimple *, int);
*************** extern bool enable_rtl_dump_file (void);
*** 434,439 ****
--- 436,443 ----
template<unsigned int N, typename C>
void dump_dec (dump_flags_t, const poly_int<N, C> &);
+ extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
+ extern void dump_hex (dump_flags_t, const poly_wide_int &);
/* In tree-dump.c */
extern void dump_node (const_tree, dump_flags_t, FILE *);
Index: gcc/dumpfile.c
===================================================================
*** gcc/dumpfile.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/dumpfile.c 2018-06-29 12:33:06.717263602 +0100
*************** dump_generic_expr (dump_flags_t dump_kin
*** 498,507 ****
--- 498,527 ----
tree t)
{
if (dump_file && (dump_kind & pflags))
+ print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
+ }
+
+ /* Similar to dump_generic_expr, except additionally print source location. */
+
+ void
+ dump_generic_expr_loc (dump_flags_t dump_kind, const dump_location_t &loc,
+ dump_flags_t extra_dump_flags, tree t)
+ {
+ location_t srcloc = loc.get_location_t ();
+ if (dump_file && (dump_kind & pflags))
+ {
+ dump_loc (dump_kind, dump_file, srcloc);
print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
+ }
if (alt_dump_file && (dump_kind & alt_flags))
+ {
+ dump_loc (dump_kind, alt_dump_file, srcloc);
print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
+ }
}
/* Output a formatted message using FORMAT on appropriate dump streams. */
*************** template void dump_dec (dump_flags_t, co
*** 573,578 ****
--- 593,620 ----
template void dump_dec (dump_flags_t, const poly_offset_int &);
template void dump_dec (dump_flags_t, const poly_widest_int &);
+ void
+ dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
+ {
+ if (dump_file && (dump_kind & pflags))
+ print_dec (value, dump_file, sgn);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_dec (value, alt_dump_file, sgn);
+ }
+
+ /* Output VALUE in hexadecimal to appropriate dump streams. */
+
+ void
+ dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
+ {
+ if (dump_file && (dump_kind & pflags))
+ print_hex (value, dump_file);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_hex (value, alt_dump_file);
+ }
+
/* Start a dump for PHASE. Store user-supplied dump flags in
*FLAG_PTR. Return the number of streams opened. Set globals
DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
Index: gcc/tree-vectorizer.h
===================================================================
*** gcc/tree-vectorizer.h 2018-06-29 12:33:06.000000000 +0100
--- gcc/tree-vectorizer.h 2018-06-29 12:33:06.725263540 +0100
*************** typedef struct _stmt_vec_info {
*** 899,904 ****
--- 899,919 ----
/* The number of scalar stmt references from active SLP instances. */
unsigned int num_slp_uses;
+
+ /* If nonzero, the lhs of the statement could be truncated to this
+ many bits without affecting any users of the result. */
+ unsigned int min_output_precision;
+
+ /* If nonzero, all non-boolean input operands have the same precision,
+ and they could each be truncated to this many bits without changing
+ the result. */
+ unsigned int min_input_precision;
+
+ /* If OPERATION_BITS is nonzero, the statement could be performed on
+ an integer with the sign and number of bits given by OPERATION_SIGN
+ and OPERATION_BITS without changing the result. */
+ unsigned int operation_precision;
+ signop operation_sign;
} *stmt_vec_info;
/* Information about a gather/scatter call. */
Index: gcc/tree-vect-patterns.c
===================================================================
*** gcc/tree-vect-patterns.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/tree-vect-patterns.c 2018-06-29 12:33:06.721263572 +0100
*************** Software Foundation; either version 3, o
*** 47,52 ****
--- 47,86 ----
#include "omp-simd-clone.h"
#include "predict.h"
+ /* Return true if we have a useful VR_RANGE range for VAR, storing it
+ in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
+
+ static bool
+ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
+ {
+ value_range_type vr_type = get_range_info (var, min_value, max_value);
+ wide_int nonzero = get_nonzero_bits (var);
+ signop sgn = TYPE_SIGN (TREE_TYPE (var));
+ if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
+ nonzero, sgn) == VR_RANGE)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ dump_printf (MSG_NOTE, " has range [");
+ dump_hex (MSG_NOTE, *min_value);
+ dump_printf (MSG_NOTE, ", ");
+ dump_hex (MSG_NOTE, *max_value);
+ dump_printf (MSG_NOTE, "]\n");
+ }
+ return true;
+ }
+ else
+ {
+ if (dump_enabled_p ())
+ {
+ dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ dump_printf (MSG_NOTE, " has no range info\n");
+ }
+ return false;
+ }
+ }
+
/* Report that we've found an instance of pattern PATTERN in
statement STMT. */
*************** vect_supportable_direct_optab_p (tree ot
*** 190,229 ****
return true;
}
- /* Check whether STMT2 is in the same loop or basic block as STMT1.
- Which of the two applies depends on whether we're currently doing
- loop-based or basic-block-based vectorization, as determined by
- the vinfo_for_stmt for STMT1 (which must be defined).
-
- If this returns true, vinfo_for_stmt for STMT2 is guaranteed
- to be defined as well. */
-
- static bool
- vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
- {
- stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
- return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
- }
-
- /* If the LHS of DEF_STMT has a single use, and that statement is
- in the same loop or basic block, return it. */
-
- static gimple *
- vect_single_imm_use (gimple *def_stmt)
- {
- tree lhs = gimple_assign_lhs (def_stmt);
- use_operand_p use_p;
- gimple *use_stmt;
-
- if (!single_imm_use (lhs, &use_p, &use_stmt))
- return NULL;
-
- if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
- return NULL;
-
- return use_stmt;
- }
-
/* Round bit precision PRECISION up to a full element. */
static unsigned int
--- 224,229 ----
*************** vect_unpromoted_value::set_op (tree op_i
*** 347,353 ****
is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its
! input in UNPROM, otherwise describe OP' in UNPROM.
A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change,
--- 347,355 ----
is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its
! input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
! is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
! have more than one user.
A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change,
*************** vect_unpromoted_value::set_op (tree op_i
*** 374,380 ****
static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! vect_unpromoted_value *unprom)
{
tree res = NULL_TREE;
tree op_type = TREE_TYPE (op);
--- 376,383 ----
static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op,
! vect_unpromoted_value *unprom,
! bool *single_use_p = NULL)
{
tree res = NULL_TREE;
tree op_type = TREE_TYPE (op);
*************** vect_look_through_possible_promotion (ve
*** 420,426 ****
if (!def_stmt)
break;
if (dt == vect_internal_def)
! caster = vinfo_for_stmt (def_stmt);
else
caster = NULL;
gassign *assign = dyn_cast <gassign *> (def_stmt);
--- 423,436 ----
if (!def_stmt)
break;
if (dt == vect_internal_def)
! {
! caster = vinfo_for_stmt (def_stmt);
! /* Ignore pattern statements, since we don't link uses for them. */
! if (single_use_p
! && !STMT_VINFO_RELATED_STMT (caster)
! && !has_single_use (res))
! *single_use_p = false;
! }
else
caster = NULL;
gassign *assign = dyn_cast <gassign *> (def_stmt);
*************** vect_recog_widen_sum_pattern (vec<gimple
*** 1371,1733 ****
return pattern_stmt;
}
! /* Return TRUE if the operation in STMT can be performed on a smaller type.
! Input:
! STMT - a statement to check.
! DEF - we support operations with two operands, one of which is constant.
! The other operand can be defined by a demotion operation, or by a
! previous statement in a sequence of over-promoted operations. In the
! later case DEF is used to replace that operand. (It is defined by a
! pattern statement we created for the previous statement in the
! sequence).
!
! Input/output:
! NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
! NULL, it's the type of DEF.
! STMTS - additional pattern statements. If a pattern statement (type
! conversion) is created in this function, its original statement is
! added to STMTS.
! Output:
! OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
! operands to use in the new pattern statement for STMT (will be created
! in vect_recog_over_widening_pattern ()).
! NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
! statements for STMT: the first one is a type promotion and the second
! one is the operation itself. We return the type promotion statement
! in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
! the second pattern statement. */
! static bool
! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
! tree *op0, tree *op1, gimple **new_def_stmt,
! vec<gimple *> *stmts)
! {
! enum tree_code code;
! tree const_oprnd, oprnd;
! tree interm_type = NULL_TREE, half_type, new_oprnd, type;
! gimple *def_stmt, *new_stmt;
! bool first = false;
! bool promotion;
! *op0 = NULL_TREE;
! *op1 = NULL_TREE;
! *new_def_stmt = NULL;
! if (!is_gimple_assign (stmt))
! return false;
! code = gimple_assign_rhs_code (stmt);
! if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
! && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
! return false;
! oprnd = gimple_assign_rhs1 (stmt);
! const_oprnd = gimple_assign_rhs2 (stmt);
! type = gimple_expr_type (stmt);
! if (TREE_CODE (oprnd) != SSA_NAME
! || TREE_CODE (const_oprnd) != INTEGER_CST)
! return false;
! /* If oprnd has other uses besides that in stmt we cannot mark it
! as being part of a pattern only. */
! if (!has_single_use (oprnd))
! return false;
! /* If we are in the middle of a sequence, we use DEF from a previous
! statement. Otherwise, OPRND has to be a result of type promotion. */
! if (*new_type)
! {
! half_type = *new_type;
! oprnd = def;
! }
! else
{
! first = true;
! if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
! &promotion)
! || !promotion
! || !vect_same_loop_or_bb_p (stmt, def_stmt))
! return false;
}
! /* Can we perform the operation on a smaller type? */
! switch (code)
! {
! case BIT_IOR_EXPR:
! case BIT_XOR_EXPR:
! case BIT_AND_EXPR:
! if (!int_fits_type_p (const_oprnd, half_type))
! {
! /* HALF_TYPE is not enough. Try a bigger type if possible. */
! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
! return false;
!
! interm_type = build_nonstandard_integer_type (
! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
! if (!int_fits_type_p (const_oprnd, interm_type))
! return false;
! }
!
! break;
!
! case LSHIFT_EXPR:
! /* Try intermediate type - HALF_TYPE is not enough for sure. */
! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
! return false;
!
! /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
! (e.g., if the original value was char, the shift amount is at most 8
! if we want to use short). */
! if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
! return false;
!
! interm_type = build_nonstandard_integer_type (
! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
!
! if (!vect_supportable_shift (code, interm_type))
! return false;
!
! break;
!
! case RSHIFT_EXPR:
! if (vect_supportable_shift (code, half_type))
! break;
!
! /* Try intermediate type - HALF_TYPE is not supported. */
! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
! return false;
!
! interm_type = build_nonstandard_integer_type (
! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
!
! if (!vect_supportable_shift (code, interm_type))
! return false;
!
! break;
!
! default:
! gcc_unreachable ();
! }
!
! /* There are four possible cases:
! 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
! the first statement in the sequence)
! a. The original, HALF_TYPE, is not enough - we replace the promotion
! from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
! b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
! promotion.
! 2. OPRND is defined by a pattern statement we created.
! a. Its type is not sufficient for the operation, we create a new stmt:
! a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
! this statement in NEW_DEF_STMT, and it is later put in
! STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
! b. OPRND is good to use in the new statement. */
! if (first)
! {
! if (interm_type)
! {
! /* Replace the original type conversion HALF_TYPE->TYPE with
! HALF_TYPE->INTERM_TYPE. */
! if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
! {
! new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
! /* Check if the already created pattern stmt is what we need. */
! if (!is_gimple_assign (new_stmt)
! || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
! || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
! return false;
!
! stmts->safe_push (def_stmt);
! oprnd = gimple_assign_lhs (new_stmt);
! }
! else
! {
! /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
! oprnd = gimple_assign_rhs1 (def_stmt);
! new_oprnd = make_ssa_name (interm_type);
! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
! stmts->safe_push (def_stmt);
! oprnd = new_oprnd;
! }
! }
! else
! {
! /* Retrieve the operand before the type promotion. */
! oprnd = gimple_assign_rhs1 (def_stmt);
! }
! }
! else
! {
! if (interm_type)
! {
! /* Create a type conversion HALF_TYPE->INTERM_TYPE. */
! new_oprnd = make_ssa_name (interm_type);
! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
! oprnd = new_oprnd;
! *new_def_stmt = new_stmt;
! }
! /* Otherwise, OPRND is already set. */
}
! if (interm_type)
! *new_type = interm_type;
! else
! *new_type = half_type;
! *op0 = oprnd;
! *op1 = fold_convert (*new_type, const_oprnd);
!
! return true;
}
! /* Try to find a statement or a sequence of statements that can be performed
! on a smaller type:
! type x_t;
! TYPE x_T, res0_T, res1_T;
! loop:
! S1 x_t = *p;
! S2 x_T = (TYPE) x_t;
! S3 res0_T = op (x_T, C0);
! S4 res1_T = op (res0_T, C1);
! S5 ... = () res1_T; - type demotion
!
! where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
! constants.
! Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
! be 'type' or some intermediate type. For now, we expect S5 to be a type
! demotion operation. We also check that S3 and S4 have only one use. */
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
! gimple *stmt = stmts->pop ();
! gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
! *use_stmt = NULL;
! tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
! tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
! bool first;
! tree type = NULL;
!
! first = true;
! while (1)
! {
! if (!vinfo_for_stmt (stmt)
! || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
! return NULL;
!
! new_def_stmt = NULL;
! if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
! &op0, &op1, &new_def_stmt,
! stmts))
! {
! if (first)
! return NULL;
! else
! break;
! }
! /* STMT can be performed on a smaller type. Check its uses. */
! use_stmt = vect_single_imm_use (stmt);
! if (!use_stmt || !is_gimple_assign (use_stmt))
! return NULL;
!
! /* Create pattern statement for STMT. */
! vectype = get_vectype_for_scalar_type (new_type);
! if (!vectype)
! return NULL;
!
! /* We want to collect all the statements for which we create pattern
! statetments, except for the case when the last statement in the
! sequence doesn't have a corresponding pattern statement. In such
! case we associate the last pattern statement with the last statement
! in the sequence. Therefore, we only add the original statement to
! the list if we know that it is not the last. */
! if (prev_stmt)
! stmts->safe_push (prev_stmt);
! var = vect_recog_temp_ssa_var (new_type, NULL);
! pattern_stmt
! = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
! new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
! if (dump_enabled_p ())
! {
! dump_printf_loc (MSG_NOTE, vect_location,
! "created pattern stmt: ");
! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
! }
! type = gimple_expr_type (stmt);
! prev_stmt = stmt;
! stmt = use_stmt;
!
! first = false;
! }
!
! /* We got a sequence. We expect it to end with a type demotion operation.
! Otherwise, we quit (for now). There are three possible cases: the
! conversion is to NEW_TYPE (we don't do anything), the conversion is to
! a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
! NEW_TYPE differs (we create a new conversion statement). */
! if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
! {
! use_lhs = gimple_assign_lhs (use_stmt);
! use_type = TREE_TYPE (use_lhs);
! /* Support only type demotion or signedess change. */
! if (!INTEGRAL_TYPE_P (use_type)
! || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
! return NULL;
! /* Check that NEW_TYPE is not bigger than the conversion result. */
! if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
! return NULL;
! if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
! || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
! {
! *type_out = get_vectype_for_scalar_type (use_type);
! if (!*type_out)
! return NULL;
! /* Create NEW_TYPE->USE_TYPE conversion. */
! new_oprnd = make_ssa_name (use_type);
! pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
!
! /* We created a pattern statement for the last statement in the
! sequence, so we don't need to associate it with the pattern
! statement created for PREV_STMT. Therefore, we add PREV_STMT
! to the list in order to mark it later in vect_pattern_recog_1. */
! if (prev_stmt)
! stmts->safe_push (prev_stmt);
! }
! else
! {
! if (prev_stmt)
! STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
! = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
! *type_out = vectype;
! }
! stmts->safe_push (use_stmt);
! }
! else
! /* TODO: support general case, create a conversion to the correct type. */
return NULL;
! /* Pattern detected. */
! vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
return pattern_stmt;
}
--- 1381,1698 ----
return pattern_stmt;
}
+ /* Recognize cases in which an operation is performed in one type WTYPE
+ but could be done more efficiently in a narrower type NTYPE. For example,
+ if we have:
+
+ ATYPE a; // narrower than NTYPE
+ BTYPE b; // narrower than NTYPE
+ WTYPE aw = (WTYPE) a;
+ WTYPE bw = (WTYPE) b;
+ WTYPE res = aw + bw; // only uses of aw and bw
+
+ then it would be more efficient to do:
+
+ NTYPE an = (NTYPE) a;
+ NTYPE bn = (NTYPE) b;
+ NTYPE resn = an + bn;
+ WTYPE res = (WTYPE) resn;
+
+ Other situations include things like:
+
+ ATYPE a; // NTYPE or narrower
+ WTYPE aw = (WTYPE) a;
+ WTYPE res = aw + b;
+
+ when only "(NTYPE) res" is significant. In that case it's more efficient
+ to truncate "b" and do the operation on NTYPE instead:
+
+ NTYPE an = (NTYPE) a;
+ NTYPE bn = (NTYPE) b; // truncation
+ NTYPE resn = an + bn;
+ WTYPE res = (WTYPE) resn;
+
+ All users of "res" should then use "resn" instead, making the final
+ statement dead (not marked as relevant). The final statement is still
+ needed to maintain the type correctness of the IR.
+
+ vect_determine_precisions has already determined the minimum
+ precison of the operation and the minimum precision required
+ by users of the result. */
! static gimple *
! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
! {
! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
! if (!last_stmt)
! return NULL;
! /* See whether we have found that this operation can be done on a
! narrower type without changing its semantics. */
! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
! unsigned int new_precision = last_stmt_info->operation_precision;
! if (!new_precision)
! return NULL;
! vec_info *vinfo = last_stmt_info->vinfo;
! tree lhs = gimple_assign_lhs (last_stmt);
! tree type = TREE_TYPE (lhs);
! tree_code code = gimple_assign_rhs_code (last_stmt);
!
! /* Keep the first operand of a COND_EXPR as-is: only the other two
! operands are interesting. */
! unsigned int first_op = (code == COND_EXPR ? 2 : 1);
! /* Check the operands. */
! unsigned int nops = gimple_num_ops (last_stmt) - first_op;
! auto_vec <vect_unpromoted_value, 3> unprom (nops);
! unprom.quick_grow (nops);
! unsigned int min_precision = 0;
! bool single_use_p = false;
! for (unsigned int i = 0; i < nops; ++i)
! {
! tree op = gimple_op (last_stmt, first_op + i);
! if (TREE_CODE (op) == INTEGER_CST)
! unprom[i].set_op (op, vect_constant_def);
! else if (TREE_CODE (op) == SSA_NAME)
! {
! bool op_single_use_p = true;
! if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
! &op_single_use_p))
! return NULL;
! /* If:
! (1) N bits of the result are needed;
! (2) all inputs are widened from M<N bits; and
! (3) one operand OP is a single-use SSA name
!
! we can shift the M->N widening from OP to the output
! without changing the number or type of extensions involved.
! This then reduces the number of copies of STMT_INFO.
!
! If instead of (3) more than one operand is a single-use SSA name,
! shifting the extension to the output is even more of a win.
!
! If instead:
!
! (1) N bits of the result are needed;
! (2) one operand OP2 is widened from M2<N bits;
! (3) another operand OP1 is widened from M1<M2 bits; and
! (4) both OP1 and OP2 are single-use
!
! the choice is between:
!
! (a) truncating OP2 to M1, doing the operation on M1,
! and then widening the result to N
!
! (b) widening OP1 to M2, doing the operation on M2, and then
! widening the result to N
!
! Both shift the M2->N widening of the inputs to the output.
! (a) additionally shifts the M1->M2 widening to the output;
! it requires fewer copies of STMT_INFO but requires an extra
! M2->M1 truncation.
!
! Which is better will depend on the complexity and cost of
! STMT_INFO, which is hard to predict at this stage. However,
! a clear tie-breaker in favor of (b) is the fact that the
! truncation in (a) increases the length of the operation chain.
!
! If instead of (4) only one of OP1 or OP2 is single-use,
! (b) is still a win over doing the operation in N bits:
! it still shifts the M2->N widening on the single-use operand
! to the output and reduces the number of STMT_INFO copies.
!
! If neither operand is single-use then operating on fewer than
! N bits might lead to more extensions overall. Whether it does
! or not depends on global information about the vectorization
! region, and whether that's a good trade-off would again
! depend on the complexity and cost of the statements involved,
! as well as things like register pressure that are not normally
! modelled at this stage. We therefore ignore these cases
! and just optimize the clear single-use wins above.
!
! Thus we take the maximum precision of the unpromoted operands
! and record whether any operand is single-use. */
! if (unprom[i].dt == vect_internal_def)
! {
! min_precision = MAX (min_precision,
! TYPE_PRECISION (unprom[i].type));
! single_use_p |= op_single_use_p;
! }
! }
! }
! /* Although the operation could be done in operation_precision, we have
! to balance that against introducing extra truncations or extensions.
! Calculate the minimum precision that can be handled efficiently.
!
! The loop above determined that the operation could be handled
! efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
! extension from the inputs to the output without introducing more
! instructions, and would reduce the number of instructions required
! for STMT_INFO itself.
!
! vect_determine_precisions has also determined that the result only
! needs min_output_precision bits. Truncating by a factor of N times
! requires a tree of N - 1 instructions, so if TYPE is N times wider
! than min_output_precision, doing the operation in TYPE and truncating
! the result requires N + (N - 1) = 2N - 1 instructions per output vector.
! In contrast:
!
! - truncating the input to a unary operation and doing the operation
! in the new type requires at most N - 1 + 1 = N instructions per
! output vector
!
! - doing the same for a binary operation requires at most
! (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
!
! Both unary and binary operations require fewer instructions than
! this if the operands were extended from a suitable truncated form.
! Thus there is usually nothing to lose by doing operations in
! min_output_precision bits, but there can be something to gain. */
! if (!single_use_p)
! min_precision = last_stmt_info->min_output_precision;
! else
! min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
! /* Apply the minimum efficient precision we just calculated. */
! if (new_precision < min_precision)
! new_precision = min_precision;
! if (new_precision >= TYPE_PRECISION (type))
! return NULL;
! vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
! *type_out = get_vectype_for_scalar_type (type);
! if (!*type_out)
! return NULL;
! /* We've found a viable pattern. Get the new type of the operation. */
! bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
! tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
!
! /* We specifically don't check here whether the target supports the
! new operation, since it might be something that a later pattern
! wants to rewrite anyway. If targets have a minimum element size
! for some optabs, we should pattern-match smaller ops to larger ops
! where beneficial. */
! tree new_vectype = get_vectype_for_scalar_type (new_type);
! if (!new_vectype)
! return NULL;
! if (dump_enabled_p ())
{
! dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
! dump_printf (MSG_NOTE, " to ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
! dump_printf (MSG_NOTE, "\n");
}
! /* Calculate the rhs operands for an operation on NEW_TYPE. */
! STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
! tree ops[3] = {};
! for (unsigned int i = 1; i < first_op; ++i)
! ops[i - 1] = gimple_op (last_stmt, i);
! vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
! new_type, &unprom[0], new_vectype);
!
! /* Use the operation to produce a result of type NEW_TYPE. */
! tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
! gimple *pattern_stmt = gimple_build_assign (new_var, code,
! ops[0], ops[1], ops[2]);
! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
! if (dump_enabled_p ())
! {
! dump_printf_loc (MSG_NOTE, vect_location,
! "created pattern stmt: ");
! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
}
! pattern_stmt = vect_convert_output (last_stmt_info, type,
! pattern_stmt, new_vectype);
! stmts->safe_push (last_stmt);
! return pattern_stmt;
}
+ /* Recognize cases in which the input to a cast is wider than its
+ output, and the input is fed by a widening operation. Fold this
+ by removing the unnecessary intermediate widening. E.g.:
! unsigned char a;
! unsigned int b = (unsigned int) a;
! unsigned short c = (unsigned short) b;
! -->
! unsigned short c = (unsigned short) a;
! Although this is rare in input IR, it is an expected side-effect
! of the over-widening pattern above.
! This is beneficial also for integer-to-float conversions, if the
! widened integer has more bits than the float, and if the unwidened
! input doesn't. */
! static gimple *
! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
! {
! /* Check for a cast, including an integer-to-float conversion. */
! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
! if (!last_stmt)
! return NULL;
! tree_code code = gimple_assign_rhs_code (last_stmt);
! if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
! return NULL;
! /* Make sure that the rhs is a scalar with a natural bitsize. */
! tree lhs = gimple_assign_lhs (last_stmt);
! if (!lhs)
! return NULL;
! tree lhs_type = TREE_TYPE (lhs);
! scalar_mode lhs_mode;
! if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
! || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
! return NULL;
! /* Check for a narrowing operation (from a vector point of view). */
! tree rhs = gimple_assign_rhs1 (last_stmt);
! tree rhs_type = TREE_TYPE (rhs);
! if (!INTEGRAL_TYPE_P (rhs_type)
! || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
! || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
! return NULL;
! /* Try to find an unpromoted input. */
! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
! vec_info *vinfo = last_stmt_info->vinfo;
! vect_unpromoted_value unprom;
! if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
! || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
! return NULL;
! /* If the bits above RHS_TYPE matter, make sure that they're the
! same when extending from UNPROM as they are when extending from RHS. */
! if (!INTEGRAL_TYPE_P (lhs_type)
! && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
! return NULL;
! /* We can get the same result by casting UNPROM directly, to avoid
! the unnecessary widening and narrowing. */
! vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
! *type_out = get_vectype_for_scalar_type (lhs_type);
! if (!*type_out)
return NULL;
! tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
! gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
+ stmts->safe_push (last_stmt);
return pattern_stmt;
}
*************** vect_recog_gather_scatter_pattern (vec<g
*** 4205,4210 ****
--- 4170,4559 ----
return pattern_stmt;
}
+ /* Return true if TYPE is a non-boolean integer type. These are the types
+ that we want to consider for narrowing. */
+
+ static bool
+ vect_narrowable_type_p (tree type)
+ {
+ return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
+ }
+
+ /* Return true if the operation given by CODE can be truncated to N bits
+ when only N bits of the output are needed. This is only true if bit N+1
+ of the inputs has no effect on the low N bits of the result. */
+
+ static bool
+ vect_truncatable_operation_p (tree_code code)
+ {
+ switch (code)
+ {
+ case PLUS_EXPR:
+ case MINUS_EXPR:
+ case MULT_EXPR:
+ case BIT_AND_EXPR:
+ case BIT_IOR_EXPR:
+ case BIT_XOR_EXPR:
+ case COND_EXPR:
+ return true;
+
+ default:
+ return false;
+ }
+ }
+
+ /* Record that STMT_INFO could be changed from operating on TYPE to
+ operating on a type with the precision and sign given by PRECISION
+ and SIGN respectively. PRECISION is an arbitrary bit precision;
+ it might not be a whole number of bytes. */
+
+ static void
+ vect_set_operation_type (stmt_vec_info stmt_info, tree type,
+ unsigned int precision, signop sign)
+ {
+ /* Round the precision up to a whole number of bytes. */
+ precision = vect_element_precision (precision);
+ if (precision < TYPE_PRECISION (type)
+ && (!stmt_info->operation_precision
+ || stmt_info->operation_precision > precision))
+ {
+ stmt_info->operation_precision = precision;
+ stmt_info->operation_sign = sign;
+ }
+ }
+
+ /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
+ non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
+ is an arbitrary bit precision; it might not be a whole number of bytes. */
+
+ static void
+ vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
+ unsigned int min_input_precision)
+ {
+ /* This operation in isolation only requires the inputs to have
+ MIN_INPUT_PRECISION of precision, However, that doesn't mean
+ that MIN_INPUT_PRECISION is a natural precision for the chain
+ as a whole. E.g. consider something like:
+
+ unsigned short *x, *y;
+ *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+ The right shift can be done on unsigned chars, and only requires the
+ result of "*x & 0xf0" to be done on unsigned chars. But taking that
+ approach would mean turning a natural chain of single-vector unsigned
+ short operations into one that truncates "*x" and then extends
+ "(*x & 0xf0) >> 4", with two vectors for each unsigned short
+ operation and one vector for each unsigned char operation.
+ This would be a significant pessimization.
+
+ Instead only propagate the maximum of this precision and the precision
+ required by the users of the result. This means that we don't pessimize
+ the case above but continue to optimize things like:
+
+ unsigned char *y;
+ unsigned short *x;
+ *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+ Here we would truncate two vectors of *x to a single vector of
+ unsigned chars and use single-vector unsigned char operations for
+ everything else, rather than doing two unsigned short copies of
+ "(*x & 0xf0) >> 4" and then truncating the result. */
+ min_input_precision = MAX (min_input_precision,
+ stmt_info->min_output_precision);
+
+ if (min_input_precision < TYPE_PRECISION (type)
+ && (!stmt_info->min_input_precision
+ || stmt_info->min_input_precision > min_input_precision))
+ stmt_info->min_input_precision = min_input_precision;
+ }
+
+ /* Subroutine of vect_determine_min_output_precision. Return true if
+ we can calculate a reduced number of output bits for STMT_INFO,
+ whose result is LHS. */
+
+ static bool
+ vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
+ {
+ /* Take the maximum precision required by users of the result. */
+ unsigned int precision = 0;
+ imm_use_iterator iter;
+ use_operand_p use;
+ FOR_EACH_IMM_USE_FAST (use, iter, lhs)
+ {
+ gimple *use_stmt = USE_STMT (use);
+ if (is_gimple_debug (use_stmt))
+ continue;
+ if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
+ return false;
+ stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
+ if (!use_stmt_info->min_input_precision)
+ return false;
+ precision = MAX (precision, use_stmt_info->min_input_precision);
+ }
+
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
+ precision);
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
+ dump_printf (MSG_NOTE, " are significant\n");
+ }
+ stmt_info->min_output_precision = precision;
+ return true;
+ }
+
+ /* Calculate min_output_precision for STMT_INFO. */
+
+ static void
+ vect_determine_min_output_precision (stmt_vec_info stmt_info)
+ {
+ /* We're only interested in statements with a narrowable result. */
+ tree lhs = gimple_get_lhs (stmt_info->stmt);
+ if (!lhs
+ || TREE_CODE (lhs) != SSA_NAME
+ || !vect_narrowable_type_p (TREE_TYPE (lhs)))
+ return;
+
+ if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
+ stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
+ }
+
+ /* Use range information to decide whether STMT (described by STMT_INFO)
+ could be done in a narrower type. This is effectively a forward
+ propagation, since it uses context-independent information that applies
+ to all users of an SSA name. */
+
+ static void
+ vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
+ {
+ tree lhs = gimple_assign_lhs (stmt);
+ if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+ return;
+
+ tree type = TREE_TYPE (lhs);
+ if (!vect_narrowable_type_p (type))
+ return;
+
+ /* First see whether we have any useful range information for the result. */
+ unsigned int precision = TYPE_PRECISION (type);
+ signop sign = TYPE_SIGN (type);
+ wide_int min_value, max_value;
+ if (!vect_get_range_info (lhs, &min_value, &max_value))
+ return;
+
+ tree_code code = gimple_assign_rhs_code (stmt);
+ unsigned int nops = gimple_num_ops (stmt);
+
+ if (!vect_truncatable_operation_p (code))
+ /* Check that all relevant input operands are compatible, and update
+ [MIN_VALUE, MAX_VALUE] to include their ranges. */
+ for (unsigned int i = 1; i < nops; ++i)
+ {
+ tree op = gimple_op (stmt, i);
+ if (TREE_CODE (op) == INTEGER_CST)
+ {
+ /* Don't require the integer to have RHS_TYPE (which it might
+ not for things like shift amounts, etc.), but do require it
+ to fit the type. */
+ if (!int_fits_type_p (op, type))
+ return;
+
+ min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
+ max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
+ }
+ else if (TREE_CODE (op) == SSA_NAME)
+ {
+ /* Ignore codes that don't take uniform arguments. */
+ if (!types_compatible_p (TREE_TYPE (op), type))
+ return;
+
+ wide_int op_min_value, op_max_value;
+ if (!vect_get_range_info (op, &op_min_value, &op_max_value))
+ return;
+
+ min_value = wi::min (min_value, op_min_value, sign);
+ max_value = wi::max (max_value, op_max_value, sign);
+ }
+ else
+ return;
+ }
+
+ /* Try to switch signed types for unsigned types if we can.
+ This is better for two reasons. First, unsigned ops tend
+ to be cheaper than signed ops. Second, it means that we can
+ handle things like:
+
+ signed char c;
+ int res = (int) c & 0xff00; // range [0x0000, 0xff00]
+
+ as:
+
+ signed char c;
+ unsigned short res_1 = (unsigned short) c & 0xff00;
+ int res = (int) res_1;
+
+ where the intermediate result res_1 has unsigned rather than
+ signed type. */
+ if (sign == SIGNED && !wi::neg_p (min_value))
+ sign = UNSIGNED;
+
+ /* See what precision is required for MIN_VALUE and MAX_VALUE. */
+ unsigned int precision1 = wi::min_precision (min_value, sign);
+ unsigned int precision2 = wi::min_precision (max_value, sign);
+ unsigned int value_precision = MAX (precision1, precision2);
+ if (value_precision >= precision)
+ return;
+
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ " without loss of precision: ",
+ sign == SIGNED ? "signed" : "unsigned",
+ value_precision);
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ }
+
+ vect_set_operation_type (stmt_info, type, value_precision, sign);
+ vect_set_min_input_precision (stmt_info, type, value_precision);
+ }
+
+ /* Use information about the users of STMT's result to decide whether
+ STMT (described by STMT_INFO) could be done in a narrower type.
+ This is effectively a backward propagation. */
+
+ static void
+ vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
+ {
+ tree_code code = gimple_assign_rhs_code (stmt);
+ unsigned int opno = (code == COND_EXPR ? 2 : 1);
+ tree type = TREE_TYPE (gimple_op (stmt, opno));
+ if (!vect_narrowable_type_p (type))
+ return;
+
+ unsigned int precision = TYPE_PRECISION (type);
+ unsigned int operation_precision, min_input_precision;
+ switch (code)
+ {
+ CASE_CONVERT:
+ /* Only the bits that contribute to the output matter. Don't change
+ the precision of the operation itself. */
+ operation_precision = precision;
+ min_input_precision = stmt_info->min_output_precision;
+ break;
+
+ case LSHIFT_EXPR:
+ case RSHIFT_EXPR:
+ {
+ tree shift = gimple_assign_rhs2 (stmt);
+ if (TREE_CODE (shift) != INTEGER_CST
+ || !wi::ltu_p (wi::to_widest (shift), precision))
+ return;
+ unsigned int const_shift = TREE_INT_CST_LOW (shift);
+ if (code == LSHIFT_EXPR)
+ {
+ /* We need CONST_SHIFT fewer bits of the input. */
+ operation_precision = stmt_info->min_output_precision;
+ min_input_precision = (MAX (operation_precision, const_shift)
+ - const_shift);
+ }
+ else
+ {
+ /* We need CONST_SHIFT extra bits to do the operation. */
+ operation_precision = (stmt_info->min_output_precision
+ + const_shift);
+ min_input_precision = operation_precision;
+ }
+ break;
+ }
+
+ default:
+ if (vect_truncatable_operation_p (code))
+ {
+ /* Input bit N has no effect on output bits N-1 and lower. */
+ operation_precision = stmt_info->min_output_precision;
+ min_input_precision = operation_precision;
+ break;
+ }
+ return;
+ }
+
+ if (operation_precision < precision)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ " without affecting users: ",
+ TYPE_UNSIGNED (type) ? "unsigned" : "signed",
+ operation_precision);
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ }
+ vect_set_operation_type (stmt_info, type, operation_precision,
+ TYPE_SIGN (type));
+ }
+ vect_set_min_input_precision (stmt_info, type, min_input_precision);
+ }
+
+ /* Handle vect_determine_precisions for STMT_INFO, given that we
+ have already done so for the users of its result. */
+
+ void
+ vect_determine_stmt_precisions (stmt_vec_info stmt_info)
+ {
+ vect_determine_min_output_precision (stmt_info);
+ if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
+ {
+ vect_determine_precisions_from_range (stmt_info, stmt);
+ vect_determine_precisions_from_users (stmt_info, stmt);
+ }
+ }
+
+ /* Walk backwards through the vectorizable region to determine the
+ values of these fields:
+
+ - min_output_precision
+ - min_input_precision
+ - operation_precision
+ - operation_sign. */
+
+ void
+ vect_determine_precisions (vec_info *vinfo)
+ {
+ DUMP_VECT_SCOPE ("vect_determine_precisions");
+
+ if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+ unsigned int nbbs = loop->num_nodes;
+
+ for (unsigned int i = 0; i < nbbs; i++)
+ {
+ basic_block bb = bbs[nbbs - i - 1];
+ for (gimple_stmt_iterator si = gsi_last_bb (bb);
+ !gsi_end_p (si); gsi_prev (&si))
+ vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
+ }
+ }
+ else
+ {
+ bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+ gimple_stmt_iterator si = bb_vinfo->region_end;
+ gimple *stmt;
+ do
+ {
+ if (!gsi_stmt (si))
+ si = gsi_last_bb (bb_vinfo->bb);
+ else
+ gsi_prev (&si);
+ stmt = gsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+ vect_determine_stmt_precisions (stmt_info);
+ }
+ while (stmt != gsi_stmt (bb_vinfo->region_begin));
+ }
+ }
+
typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
struct vect_recog_func
*************** struct vect_recog_func
*** 4217,4229 ****
taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = {
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
- { vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" },
--- 4566,4579 ----
taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = {
+ { vect_recog_over_widening_pattern, "over_widening" },
+ { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" },
*************** vect_pattern_recog (vec_info *vinfo)
*** 4497,4502 ****
--- 4847,4854 ----
unsigned int i, j;
auto_vec<gimple *, 1> stmts_to_replace;
+ vect_determine_precisions (vinfo);
+
DUMP_VECT_SCOPE ("vect_pattern_recog");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 62,69 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 62,70 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 58,64 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 58,66 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 57,63 ****
return 0;
}
! /* Final value stays in int, so no over-widening is detected at the moment. */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 57,68 ----
return 0;
}
! /* This is an over-widening even though the final result is still an int.
! It's better to do one vector of ops on chars and then widen than to
! widen and then do 4 vectors of ops on ints. */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 57,63 ****
return 0;
}
! /* Final value stays in int, so no over-widening is detected at the moment. */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 57,68 ----
return 0;
}
! /* This is an over-widening even though the final result is still an int.
! It's better to do one vector of ops on chars and then widen than to
! widen and then do 4 vectors of ops on ints. */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 57,62 ****
return 0;
}
! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 57,65 ----
return 0;
}
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 59,65 ****
return 0;
}
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 59,67 ----
return 0;
}
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 66,73 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 66,74 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-06-29 12:33:06.000000000 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-06-29 12:33:06.721263572 +0100
*************** int main (void)
*** 62,68 ****
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
--- 62,70 ----
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,66 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ /* Deliberate use of signed >>. */
+ #define DEF_LOOP(SIGNEDNESS) \
+ void __attribute__ ((noipa)) \
+ f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
+ SIGNEDNESS char *restrict b, \
+ SIGNEDNESS char *restrict c) \
+ { \
+ a[0] = (b[0] + c[0]) >> 1; \
+ a[1] = (b[1] + c[1]) >> 1; \
+ a[2] = (b[2] + c[2]) >> 1; \
+ a[3] = (b[3] + c[3]) >> 1; \
+ a[4] = (b[4] + c[4]) >> 1; \
+ a[5] = (b[5] + c[5]) >> 1; \
+ a[6] = (b[6] + c[6]) >> 1; \
+ a[7] = (b[7] + c[7]) >> 1; \
+ a[8] = (b[8] + c[8]) >> 1; \
+ a[9] = (b[9] + c[9]) >> 1; \
+ a[10] = (b[10] + c[10]) >> 1; \
+ a[11] = (b[11] + c[11]) >> 1; \
+ a[12] = (b[12] + c[12]) >> 1; \
+ a[13] = (b[13] + c[13]) >> 1; \
+ a[14] = (b[14] + c[14]) >> 1; \
+ a[15] = (b[15] + c[15]) >> 1; \
+ }
+
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+
+ #define N 16
+
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \
+ { \
+ SIGNEDNESS char a[N], b[N], c[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ b[i] = BASE_B + i * 15; \
+ c[i] = BASE_C + i * 14; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ f_##SIGNEDNESS (a, b, c); \
+ for (int i = 0; i < N; ++i) \
+ if (a[i] != (BASE_B + BASE_C + i * 29) >> 1) \
+ __builtin_abort (); \
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ TEST_LOOP (signed, -128, -120);
+ TEST_LOOP (unsigned, 4, 10);
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,65 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ /* Deliberate use of signed >>. */
+ #define DEF_LOOP(SIGNEDNESS) \
+ void __attribute__ ((noipa)) \
+ f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
+ SIGNEDNESS char *restrict b, \
+ SIGNEDNESS char c) \
+ { \
+ a[0] = (b[0] + c) >> 1; \
+ a[1] = (b[1] + c) >> 1; \
+ a[2] = (b[2] + c) >> 1; \
+ a[3] = (b[3] + c) >> 1; \
+ a[4] = (b[4] + c) >> 1; \
+ a[5] = (b[5] + c) >> 1; \
+ a[6] = (b[6] + c) >> 1; \
+ a[7] = (b[7] + c) >> 1; \
+ a[8] = (b[8] + c) >> 1; \
+ a[9] = (b[9] + c) >> 1; \
+ a[10] = (b[10] + c) >> 1; \
+ a[11] = (b[11] + c) >> 1; \
+ a[12] = (b[12] + c) >> 1; \
+ a[13] = (b[13] + c) >> 1; \
+ a[14] = (b[14] + c) >> 1; \
+ a[15] = (b[15] + c) >> 1; \
+ }
+
+ DEF_LOOP (signed)
+ DEF_LOOP (unsigned)
+
+ #define N 16
+
+ #define TEST_LOOP(SIGNEDNESS, BASE_B, C) \
+ { \
+ SIGNEDNESS char a[N], b[N], c[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ b[i] = BASE_B + i * 15; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ f_##SIGNEDNESS (a, b, C); \
+ for (int i = 0; i < N; ++i) \
+ if (a[i] != (BASE_B + C + i * 15) >> 1) \
+ __builtin_abort (); \
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ TEST_LOOP (signed, -128, -120);
+ TEST_LOOP (unsigned, 4, 250);
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ /* Deliberate use of signed >>. */
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) >> 1;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+
+ #include "vect-over-widen-5.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #define D -120
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c, SIGNEDNESS char d)
+ {
+ int promoted_d = d;
+ for (int i = 0; i < N; ++i)
+ /* Deliberate use of signed >>. */
+ a[i] = (b[i] + c[i] + promoted_d) >> 2;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, D);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #define D 251
+ #endif
+
+ #include "vect-over-widen-7.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ for (int i = 0; i < N; ++i)
+ {
+ /* Deliberate use of signed >>. */
+ int res = b[i] + c[i];
+ a[i] = (res + (res >> 1)) >> 2;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ {
+ int res = BASE_B + BASE_C + i * 9;
+ if (a[i] != ((res + (res >> 1)) >> 2))
+ __builtin_abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-9.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,63 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -100
+ #endif
+
+ #define N 50
+
+ /* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short, with "res"
+ being extended for the store to d[i]. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c, int *restrict d)
+ {
+ for (int i = 0; i < N; ++i)
+ {
+ /* Deliberate use of signed >>. */
+ int res = b[i] + c[i];
+ a[i] = (res + (res >> 1)) >> 2;
+ d[i] = res;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ int d[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d);
+ for (int i = 0; i < N; ++i)
+ {
+ int res = BASE_B + BASE_C + i * 9;
+ if (a[i] != ((res + (res >> 1)) >> 2))
+ __builtin_abort ();
+ if (d[i] != res)
+ __builtin_abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-11.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+
+ #define N 50
+
+ /* We rely on range analysis to show that these calculations can be done
+ in SIGNEDNESS short. */
+ void __attribute__ ((noipa))
+ f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) / 2;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-13.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS signed
+ #define BASE_B -128
+ #define BASE_C -120
+ #endif
+
+ #define N 50
+
+ /* We rely on range analysis to show that these calculations can be done
+ in SIGNEDNESS short, with the result being extended to int for the
+ store. */
+ void __attribute__ ((noipa))
+ f (int *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+ {
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) / 2;
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ int a[N];
+ SIGNEDNESS char b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #ifndef SIGNEDNESS
+ #define SIGNEDNESS unsigned
+ #define BASE_B 4
+ #define BASE_C 40
+ #endif
+
+ #include "vect-over-widen-15.c"
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,46 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 1024
+
+ /* This should not be treated as an over-widening pattern, even though
+ "(b[i] & 0xef) | 0x80)" could be done in unsigned chars. */
+
+ void __attribute__ ((noipa))
+ f (unsigned short *restrict a, unsigned short *restrict b)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+ a[i] = foo;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned short a[N], b[N];
+ for (int i = 0; i < N; ++i)
+ {
+ a[i] = i;
+ b[i] = i * 3;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 1024
+
+ /* This should be treated as an over-widening pattern: we can truncate
+ b to unsigned char after loading it and do all the computation in
+ unsigned char. */
+
+ void __attribute__ ((noipa))
+ f (unsigned char *restrict a, unsigned short *restrict b)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+ a[i] = foo;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned char a[N];
+ unsigned short b[N];
+ for (int i = 0; i < N; ++i)
+ {
+ a[i] = i;
+ b[i] = i * 3;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
+ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
+ /* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
+ /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 111
+
+ /* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned int di = d[i];
+ unsigned int ei = e[i];
+ a[i] = di;
+ b[i] = ei;
+ c[i] = di + ei;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 111
+
+ /* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ int di = d[i];
+ int ei = e[i];
+ a[i] = di;
+ b[i] = ei;
+ c[i] = di + ei;
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
===================================================================
*** /dev/null 2018-06-13 14:36:57.192460992 +0100
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c 2018-06-29 12:33:06.721263572 +0100
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_shift } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+
+ #include "tree-vect.h"
+
+ #define N 111
+
+ /* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+ void __attribute__ ((noipa))
+ f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+ {
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ a[i] = d[i];
+ b[i] = e[i];
+ c[i] = d[i] + e[i];
+ }
+ }
+
+ int
+ main (void)
+ {
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-06-29 12:56 ` Richard Sandiford
@ 2018-07-02 11:02 ` Christophe Lyon
2018-07-02 13:37 ` Richard Sandiford
2018-07-02 13:12 ` Richard Biener
1 sibling, 1 reply; 10+ messages in thread
From: Christophe Lyon @ 2018-07-02 11:02 UTC (permalink / raw)
To: gcc Patches, Richard Sandiford
On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > This patch is the main part of PR85694. The aim is to recognise at least:
> >
> > signed char *a, *b, *c;
> > ...
> > for (int i = 0; i < 2048; i++)
> > c[i] = (a[i] + b[i]) >> 1;
> >
> > as an over-widening pattern, since the addition and shift can be done
> > on shorts rather than ints. However, it ended up being a lot more
> > general than that.
> >
> > The current over-widening pattern detection is limited to a few simple
> > cases: logical ops with immediate second operands, and shifts by a
> > constant. These cases are enough for common pixel-format conversion
> > and can be detected in a peephole way.
> >
> > The loop above requires two generalisations of the current code: support
> > for addition as well as logical ops, and support for non-constant second
> > operands. These are harder to detect in the same peephole way, so the
> > patch tries to take a more global approach.
> >
> > The idea is to get information about the minimum operation width
> > in two ways:
> >
> > (1) by using the range information attached to the SSA_NAMEs
> > (effectively a forward walk, since the range info is
> > context-independent).
> >
> > (2) by back-propagating the number of output bits required by
> > users of the result.
> >
> > As explained in the comments, there's a balance to be struck between
> > narrowing an individual operation and fitting in with the surrounding
> > code. The approach is pretty conservative: if we could narrow an
> > operation to N bits without changing its semantics, it's OK to do that if:
> >
> > - no operations later in the chain require more than N bits; or
> >
> > - all internally-defined inputs are extended from N bits or fewer,
> > and at least one of them is single-use.
> >
> > See the comments for the rationale.
> >
> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> > since the code seemed more readable without.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>
> Here's a version rebased on top of current trunk. Changes from last time:
>
> - reintroduce dump_generic_expr_loc, with the obvious change to the
> prototype
>
> - fix a typo in a comment
>
> - use vect_element_precision from the new version of 12/n.
>
> Tested as before. OK to install?
>
Hi Richard,
This patch introduces regressions on arm-none-linux-gnueabihf:
gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 8
gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
Christophe
> Richard
>
>
> 2018-06-29 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * poly-int.h (print_hex): New function.
> * dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
> * dumpfile.c (dump_generic_expr): Fix formatting.
> (dump_generic_expr_loc): New function.
> (dump_dec, dump_hex): New poly_wide_int functions.
> * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
> min_input_precision, operation_precision and operation_sign.
> * tree-vect-patterns.c (vect_get_range_info): New function.
> (vect_same_loop_or_bb_p, vect_single_imm_use)
> (vect_operation_fits_smaller_type): Delete.
> (vect_look_through_possible_promotion): Add an optional
> single_use_p parameter.
> (vect_recog_over_widening_pattern): Rewrite to use new
> stmt_vec_info infomration. Handle one operation at a time.
> (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
> (vect_truncatable_operation_p, vect_set_operation_type)
> (vect_set_min_input_precision): New functions.
> (vect_determine_min_output_precision_1): Likewise.
> (vect_determine_min_output_precision): Likewise.
> (vect_determine_precisions_from_range): Likewise.
> (vect_determine_precisions_from_users): Likewise.
> (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
> (vect_vect_recog_func_ptrs): Put over_widening first.
> Add cast_forwprop.
> (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
> over-widening messages.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
> * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-5.c: Likewise.
> * gcc.dg/vect/vect-over-widen-6.c: Likewise.
> * gcc.dg/vect/vect-over-widen-7.c: Likewise.
> * gcc.dg/vect/vect-over-widen-8.c: Likewise.
> * gcc.dg/vect/vect-over-widen-9.c: Likewise.
> * gcc.dg/vect/vect-over-widen-10.c: Likewise.
> * gcc.dg/vect/vect-over-widen-11.c: Likewise.
> * gcc.dg/vect/vect-over-widen-12.c: Likewise.
> * gcc.dg/vect/vect-over-widen-13.c: Likewise.
> * gcc.dg/vect/vect-over-widen-14.c: Likewise.
> * gcc.dg/vect/vect-over-widen-15.c: Likewise.
> * gcc.dg/vect/vect-over-widen-16.c: Likewise.
> * gcc.dg/vect/vect-over-widen-17.c: Likewise.
> * gcc.dg/vect/vect-over-widen-18.c: Likewise.
> * gcc.dg/vect/vect-over-widen-19.c: Likewise.
> * gcc.dg/vect/vect-over-widen-20.c: Likewise.
> * gcc.dg/vect/vect-over-widen-21.c: Likewise.
>
> Index: gcc/poly-int.h
> ===================================================================
> *** gcc/poly-int.h 2018-06-29 12:33:06.000000000 +0100
> --- gcc/poly-int.h 2018-06-29 12:33:06.721263572 +0100
> *************** print_dec (const poly_int_pod<N, C> &val
> *** 2420,2425 ****
> --- 2420,2444 ----
> poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
> }
>
> + /* Use print_hex to print VALUE to FILE. */
> +
> + template<unsigned int N, typename C>
> + void
> + print_hex (const poly_int_pod<N, C> &value, FILE *file)
> + {
> + if (value.is_constant ())
> + print_hex (value.coeffs[0], file);
> + else
> + {
> + fprintf (file, "[");
> + for (unsigned int i = 0; i < N; ++i)
> + {
> + print_hex (value.coeffs[i], file);
> + fputc (i == N - 1 ? ']' : ',', file);
> + }
> + }
> + }
> +
> /* Helper for calculating the distance between two points P1 and P2,
> in cases where known_le (P1, P2). T1 and T2 are the types of the
> two positions, in either order. The coefficients of P2 - P1 have
> Index: gcc/dumpfile.h
> ===================================================================
> *** gcc/dumpfile.h 2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.h 2018-06-29 12:33:06.717263602 +0100
> *************** extern void dump_printf_loc (dump_flags_
> *** 425,430 ****
> --- 425,432 ----
> const char *, ...) ATTRIBUTE_PRINTF_3;
> extern void dump_function (int phase, tree fn);
> extern void dump_basic_block (dump_flags_t, basic_block, int);
> + extern void dump_generic_expr_loc (dump_flags_t, const dump_location_t &,
> + dump_flags_t, tree);
> extern void dump_generic_expr (dump_flags_t, dump_flags_t, tree);
> extern void dump_gimple_stmt_loc (dump_flags_t, const dump_location_t &,
> dump_flags_t, gimple *, int);
> *************** extern bool enable_rtl_dump_file (void);
> *** 434,439 ****
> --- 436,443 ----
>
> template<unsigned int N, typename C>
> void dump_dec (dump_flags_t, const poly_int<N, C> &);
> + extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
> + extern void dump_hex (dump_flags_t, const poly_wide_int &);
>
> /* In tree-dump.c */
> extern void dump_node (const_tree, dump_flags_t, FILE *);
> Index: gcc/dumpfile.c
> ===================================================================
> *** gcc/dumpfile.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.c 2018-06-29 12:33:06.717263602 +0100
> *************** dump_generic_expr (dump_flags_t dump_kin
> *** 498,507 ****
> --- 498,527 ----
> tree t)
> {
> if (dump_file && (dump_kind & pflags))
> + print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> + }
> +
> + /* Similar to dump_generic_expr, except additionally print source location. */
> +
> + void
> + dump_generic_expr_loc (dump_flags_t dump_kind, const dump_location_t &loc,
> + dump_flags_t extra_dump_flags, tree t)
> + {
> + location_t srcloc = loc.get_location_t ();
> + if (dump_file && (dump_kind & pflags))
> + {
> + dump_loc (dump_kind, dump_file, srcloc);
> print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> + }
>
> if (alt_dump_file && (dump_kind & alt_flags))
> + {
> + dump_loc (dump_kind, alt_dump_file, srcloc);
> print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> + }
> }
>
> /* Output a formatted message using FORMAT on appropriate dump streams. */
> *************** template void dump_dec (dump_flags_t, co
> *** 573,578 ****
> --- 593,620 ----
> template void dump_dec (dump_flags_t, const poly_offset_int &);
> template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> + void
> + dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
> + {
> + if (dump_file && (dump_kind & pflags))
> + print_dec (value, dump_file, sgn);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_dec (value, alt_dump_file, sgn);
> + }
> +
> + /* Output VALUE in hexadecimal to appropriate dump streams. */
> +
> + void
> + dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
> + {
> + if (dump_file && (dump_kind & pflags))
> + print_hex (value, dump_file);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_hex (value, alt_dump_file);
> + }
> +
> /* Start a dump for PHASE. Store user-supplied dump flags in
> *FLAG_PTR. Return the number of streams opened. Set globals
> DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h 2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vectorizer.h 2018-06-29 12:33:06.725263540 +0100
> *************** typedef struct _stmt_vec_info {
> *** 899,904 ****
> --- 899,919 ----
>
> /* The number of scalar stmt references from active SLP instances. */
> unsigned int num_slp_uses;
> +
> + /* If nonzero, the lhs of the statement could be truncated to this
> + many bits without affecting any users of the result. */
> + unsigned int min_output_precision;
> +
> + /* If nonzero, all non-boolean input operands have the same precision,
> + and they could each be truncated to this many bits without changing
> + the result. */
> + unsigned int min_input_precision;
> +
> + /* If OPERATION_BITS is nonzero, the statement could be performed on
> + an integer with the sign and number of bits given by OPERATION_SIGN
> + and OPERATION_BITS without changing the result. */
> + unsigned int operation_precision;
> + signop operation_sign;
> } *stmt_vec_info;
>
> /* Information about a gather/scatter call. */
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> *** gcc/tree-vect-patterns.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vect-patterns.c 2018-06-29 12:33:06.721263572 +0100
> *************** Software Foundation; either version 3, o
> *** 47,52 ****
> --- 47,86 ----
> #include "omp-simd-clone.h"
> #include "predict.h"
>
> + /* Return true if we have a useful VR_RANGE range for VAR, storing it
> + in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
> +
> + static bool
> + vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> + {
> + value_range_type vr_type = get_range_info (var, min_value, max_value);
> + wide_int nonzero = get_nonzero_bits (var);
> + signop sgn = TYPE_SIGN (TREE_TYPE (var));
> + if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
> + nonzero, sgn) == VR_RANGE)
> + {
> + if (dump_enabled_p ())
> + {
> + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> + dump_printf (MSG_NOTE, " has range [");
> + dump_hex (MSG_NOTE, *min_value);
> + dump_printf (MSG_NOTE, ", ");
> + dump_hex (MSG_NOTE, *max_value);
> + dump_printf (MSG_NOTE, "]\n");
> + }
> + return true;
> + }
> + else
> + {
> + if (dump_enabled_p ())
> + {
> + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> + dump_printf (MSG_NOTE, " has no range info\n");
> + }
> + return false;
> + }
> + }
> +
> /* Report that we've found an instance of pattern PATTERN in
> statement STMT. */
>
> *************** vect_supportable_direct_optab_p (tree ot
> *** 190,229 ****
> return true;
> }
>
> - /* Check whether STMT2 is in the same loop or basic block as STMT1.
> - Which of the two applies depends on whether we're currently doing
> - loop-based or basic-block-based vectorization, as determined by
> - the vinfo_for_stmt for STMT1 (which must be defined).
> -
> - If this returns true, vinfo_for_stmt for STMT2 is guaranteed
> - to be defined as well. */
> -
> - static bool
> - vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> - {
> - stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> - return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> - }
> -
> - /* If the LHS of DEF_STMT has a single use, and that statement is
> - in the same loop or basic block, return it. */
> -
> - static gimple *
> - vect_single_imm_use (gimple *def_stmt)
> - {
> - tree lhs = gimple_assign_lhs (def_stmt);
> - use_operand_p use_p;
> - gimple *use_stmt;
> -
> - if (!single_imm_use (lhs, &use_p, &use_stmt))
> - return NULL;
> -
> - if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
> - return NULL;
> -
> - return use_stmt;
> - }
> -
> /* Round bit precision PRECISION up to a full element. */
>
> static unsigned int
> --- 224,229 ----
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 347,353 ****
> is possible to convert OP' back to OP using a possible sign change
> followed by a possible promotion P. Return this OP', or null if OP is
> not a vectorizable SSA name. If there is a promotion P, describe its
> ! input in UNPROM, otherwise describe OP' in UNPROM.
>
> A successful return means that it is possible to go from OP' to OP
> via UNPROM. The cast from OP' to UNPROM is at most a sign change,
> --- 347,355 ----
> is possible to convert OP' back to OP using a possible sign change
> followed by a possible promotion P. Return this OP', or null if OP is
> not a vectorizable SSA name. If there is a promotion P, describe its
> ! input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
> ! is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
> ! have more than one user.
>
> A successful return means that it is possible to go from OP' to OP
> via UNPROM. The cast from OP' to UNPROM is at most a sign change,
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 374,380 ****
>
> static tree
> vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> ! vect_unpromoted_value *unprom)
> {
> tree res = NULL_TREE;
> tree op_type = TREE_TYPE (op);
> --- 376,383 ----
>
> static tree
> vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> ! vect_unpromoted_value *unprom,
> ! bool *single_use_p = NULL)
> {
> tree res = NULL_TREE;
> tree op_type = TREE_TYPE (op);
> *************** vect_look_through_possible_promotion (ve
> *** 420,426 ****
> if (!def_stmt)
> break;
> if (dt == vect_internal_def)
> ! caster = vinfo_for_stmt (def_stmt);
> else
> caster = NULL;
> gassign *assign = dyn_cast <gassign *> (def_stmt);
> --- 423,436 ----
> if (!def_stmt)
> break;
> if (dt == vect_internal_def)
> ! {
> ! caster = vinfo_for_stmt (def_stmt);
> ! /* Ignore pattern statements, since we don't link uses for them. */
> ! if (single_use_p
> ! && !STMT_VINFO_RELATED_STMT (caster)
> ! && !has_single_use (res))
> ! *single_use_p = false;
> ! }
> else
> caster = NULL;
> gassign *assign = dyn_cast <gassign *> (def_stmt);
> *************** vect_recog_widen_sum_pattern (vec<gimple
> *** 1371,1733 ****
> return pattern_stmt;
> }
>
>
> ! /* Return TRUE if the operation in STMT can be performed on a smaller type.
>
> ! Input:
> ! STMT - a statement to check.
> ! DEF - we support operations with two operands, one of which is constant.
> ! The other operand can be defined by a demotion operation, or by a
> ! previous statement in a sequence of over-promoted operations. In the
> ! later case DEF is used to replace that operand. (It is defined by a
> ! pattern statement we created for the previous statement in the
> ! sequence).
> !
> ! Input/output:
> ! NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
> ! NULL, it's the type of DEF.
> ! STMTS - additional pattern statements. If a pattern statement (type
> ! conversion) is created in this function, its original statement is
> ! added to STMTS.
>
> ! Output:
> ! OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
> ! operands to use in the new pattern statement for STMT (will be created
> ! in vect_recog_over_widening_pattern ()).
> ! NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
> ! statements for STMT: the first one is a type promotion and the second
> ! one is the operation itself. We return the type promotion statement
> ! in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
> ! the second pattern statement. */
>
> ! static bool
> ! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
> ! tree *op0, tree *op1, gimple **new_def_stmt,
> ! vec<gimple *> *stmts)
> ! {
> ! enum tree_code code;
> ! tree const_oprnd, oprnd;
> ! tree interm_type = NULL_TREE, half_type, new_oprnd, type;
> ! gimple *def_stmt, *new_stmt;
> ! bool first = false;
> ! bool promotion;
>
> ! *op0 = NULL_TREE;
> ! *op1 = NULL_TREE;
> ! *new_def_stmt = NULL;
>
> ! if (!is_gimple_assign (stmt))
> ! return false;
>
> ! code = gimple_assign_rhs_code (stmt);
> ! if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
> ! && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
> ! return false;
>
> ! oprnd = gimple_assign_rhs1 (stmt);
> ! const_oprnd = gimple_assign_rhs2 (stmt);
> ! type = gimple_expr_type (stmt);
>
> ! if (TREE_CODE (oprnd) != SSA_NAME
> ! || TREE_CODE (const_oprnd) != INTEGER_CST)
> ! return false;
>
> ! /* If oprnd has other uses besides that in stmt we cannot mark it
> ! as being part of a pattern only. */
> ! if (!has_single_use (oprnd))
> ! return false;
>
> ! /* If we are in the middle of a sequence, we use DEF from a previous
> ! statement. Otherwise, OPRND has to be a result of type promotion. */
> ! if (*new_type)
> ! {
> ! half_type = *new_type;
> ! oprnd = def;
> ! }
> ! else
> {
> ! first = true;
> ! if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
> ! &promotion)
> ! || !promotion
> ! || !vect_same_loop_or_bb_p (stmt, def_stmt))
> ! return false;
> }
>
> ! /* Can we perform the operation on a smaller type? */
> ! switch (code)
> ! {
> ! case BIT_IOR_EXPR:
> ! case BIT_XOR_EXPR:
> ! case BIT_AND_EXPR:
> ! if (!int_fits_type_p (const_oprnd, half_type))
> ! {
> ! /* HALF_TYPE is not enough. Try a bigger type if possible. */
> ! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> ! return false;
> !
> ! interm_type = build_nonstandard_integer_type (
> ! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> ! if (!int_fits_type_p (const_oprnd, interm_type))
> ! return false;
> ! }
> !
> ! break;
> !
> ! case LSHIFT_EXPR:
> ! /* Try intermediate type - HALF_TYPE is not enough for sure. */
> ! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> ! return false;
> !
> ! /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
> ! (e.g., if the original value was char, the shift amount is at most 8
> ! if we want to use short). */
> ! if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
> ! return false;
> !
> ! interm_type = build_nonstandard_integer_type (
> ! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> ! if (!vect_supportable_shift (code, interm_type))
> ! return false;
> !
> ! break;
> !
> ! case RSHIFT_EXPR:
> ! if (vect_supportable_shift (code, half_type))
> ! break;
> !
> ! /* Try intermediate type - HALF_TYPE is not supported. */
> ! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> ! return false;
> !
> ! interm_type = build_nonstandard_integer_type (
> ! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> ! if (!vect_supportable_shift (code, interm_type))
> ! return false;
> !
> ! break;
> !
> ! default:
> ! gcc_unreachable ();
> ! }
> !
> ! /* There are four possible cases:
> ! 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
> ! the first statement in the sequence)
> ! a. The original, HALF_TYPE, is not enough - we replace the promotion
> ! from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
> ! b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
> ! promotion.
> ! 2. OPRND is defined by a pattern statement we created.
> ! a. Its type is not sufficient for the operation, we create a new stmt:
> ! a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
> ! this statement in NEW_DEF_STMT, and it is later put in
> ! STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
> ! b. OPRND is good to use in the new statement. */
> ! if (first)
> ! {
> ! if (interm_type)
> ! {
> ! /* Replace the original type conversion HALF_TYPE->TYPE with
> ! HALF_TYPE->INTERM_TYPE. */
> ! if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
> ! {
> ! new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
> ! /* Check if the already created pattern stmt is what we need. */
> ! if (!is_gimple_assign (new_stmt)
> ! || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
> ! || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
> ! return false;
> !
> ! stmts->safe_push (def_stmt);
> ! oprnd = gimple_assign_lhs (new_stmt);
> ! }
> ! else
> ! {
> ! /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
> ! oprnd = gimple_assign_rhs1 (def_stmt);
> ! new_oprnd = make_ssa_name (interm_type);
> ! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> ! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
> ! stmts->safe_push (def_stmt);
> ! oprnd = new_oprnd;
> ! }
> ! }
> ! else
> ! {
> ! /* Retrieve the operand before the type promotion. */
> ! oprnd = gimple_assign_rhs1 (def_stmt);
> ! }
> ! }
> ! else
> ! {
> ! if (interm_type)
> ! {
> ! /* Create a type conversion HALF_TYPE->INTERM_TYPE. */
> ! new_oprnd = make_ssa_name (interm_type);
> ! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> ! oprnd = new_oprnd;
> ! *new_def_stmt = new_stmt;
> ! }
>
> ! /* Otherwise, OPRND is already set. */
> }
>
> ! if (interm_type)
> ! *new_type = interm_type;
> ! else
> ! *new_type = half_type;
>
> ! *op0 = oprnd;
> ! *op1 = fold_convert (*new_type, const_oprnd);
> !
> ! return true;
> }
>
>
> ! /* Try to find a statement or a sequence of statements that can be performed
> ! on a smaller type:
>
> ! type x_t;
> ! TYPE x_T, res0_T, res1_T;
> ! loop:
> ! S1 x_t = *p;
> ! S2 x_T = (TYPE) x_t;
> ! S3 res0_T = op (x_T, C0);
> ! S4 res1_T = op (res0_T, C1);
> ! S5 ... = () res1_T; - type demotion
> !
> ! where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
> ! constants.
> ! Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
> ! be 'type' or some intermediate type. For now, we expect S5 to be a type
> ! demotion operation. We also check that S3 and S4 have only one use. */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> ! gimple *stmt = stmts->pop ();
> ! gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
> ! *use_stmt = NULL;
> ! tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
> ! tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
> ! bool first;
> ! tree type = NULL;
> !
> ! first = true;
> ! while (1)
> ! {
> ! if (!vinfo_for_stmt (stmt)
> ! || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
> ! return NULL;
> !
> ! new_def_stmt = NULL;
> ! if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
> ! &op0, &op1, &new_def_stmt,
> ! stmts))
> ! {
> ! if (first)
> ! return NULL;
> ! else
> ! break;
> ! }
>
> ! /* STMT can be performed on a smaller type. Check its uses. */
> ! use_stmt = vect_single_imm_use (stmt);
> ! if (!use_stmt || !is_gimple_assign (use_stmt))
> ! return NULL;
> !
> ! /* Create pattern statement for STMT. */
> ! vectype = get_vectype_for_scalar_type (new_type);
> ! if (!vectype)
> ! return NULL;
> !
> ! /* We want to collect all the statements for which we create pattern
> ! statetments, except for the case when the last statement in the
> ! sequence doesn't have a corresponding pattern statement. In such
> ! case we associate the last pattern statement with the last statement
> ! in the sequence. Therefore, we only add the original statement to
> ! the list if we know that it is not the last. */
> ! if (prev_stmt)
> ! stmts->safe_push (prev_stmt);
>
> ! var = vect_recog_temp_ssa_var (new_type, NULL);
> ! pattern_stmt
> ! = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
> ! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
> ! new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
>
> ! if (dump_enabled_p ())
> ! {
> ! dump_printf_loc (MSG_NOTE, vect_location,
> ! "created pattern stmt: ");
> ! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> ! }
>
> ! type = gimple_expr_type (stmt);
> ! prev_stmt = stmt;
> ! stmt = use_stmt;
> !
> ! first = false;
> ! }
> !
> ! /* We got a sequence. We expect it to end with a type demotion operation.
> ! Otherwise, we quit (for now). There are three possible cases: the
> ! conversion is to NEW_TYPE (we don't do anything), the conversion is to
> ! a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
> ! NEW_TYPE differs (we create a new conversion statement). */
> ! if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
> ! {
> ! use_lhs = gimple_assign_lhs (use_stmt);
> ! use_type = TREE_TYPE (use_lhs);
> ! /* Support only type demotion or signedess change. */
> ! if (!INTEGRAL_TYPE_P (use_type)
> ! || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
> ! return NULL;
>
> ! /* Check that NEW_TYPE is not bigger than the conversion result. */
> ! if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
> ! return NULL;
>
> ! if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
> ! || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
> ! {
> ! *type_out = get_vectype_for_scalar_type (use_type);
> ! if (!*type_out)
> ! return NULL;
>
> ! /* Create NEW_TYPE->USE_TYPE conversion. */
> ! new_oprnd = make_ssa_name (use_type);
> ! pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
> ! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
> !
> ! /* We created a pattern statement for the last statement in the
> ! sequence, so we don't need to associate it with the pattern
> ! statement created for PREV_STMT. Therefore, we add PREV_STMT
> ! to the list in order to mark it later in vect_pattern_recog_1. */
> ! if (prev_stmt)
> ! stmts->safe_push (prev_stmt);
> ! }
> ! else
> ! {
> ! if (prev_stmt)
> ! STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
> ! = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
>
> ! *type_out = vectype;
> ! }
>
> ! stmts->safe_push (use_stmt);
> ! }
> ! else
> ! /* TODO: support general case, create a conversion to the correct type. */
> return NULL;
>
> ! /* Pattern detected. */
> ! vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
>
> return pattern_stmt;
> }
>
> --- 1381,1698 ----
> return pattern_stmt;
> }
>
> + /* Recognize cases in which an operation is performed in one type WTYPE
> + but could be done more efficiently in a narrower type NTYPE. For example,
> + if we have:
> +
> + ATYPE a; // narrower than NTYPE
> + BTYPE b; // narrower than NTYPE
> + WTYPE aw = (WTYPE) a;
> + WTYPE bw = (WTYPE) b;
> + WTYPE res = aw + bw; // only uses of aw and bw
> +
> + then it would be more efficient to do:
> +
> + NTYPE an = (NTYPE) a;
> + NTYPE bn = (NTYPE) b;
> + NTYPE resn = an + bn;
> + WTYPE res = (WTYPE) resn;
> +
> + Other situations include things like:
> +
> + ATYPE a; // NTYPE or narrower
> + WTYPE aw = (WTYPE) a;
> + WTYPE res = aw + b;
> +
> + when only "(NTYPE) res" is significant. In that case it's more efficient
> + to truncate "b" and do the operation on NTYPE instead:
> +
> + NTYPE an = (NTYPE) a;
> + NTYPE bn = (NTYPE) b; // truncation
> + NTYPE resn = an + bn;
> + WTYPE res = (WTYPE) resn;
> +
> + All users of "res" should then use "resn" instead, making the final
> + statement dead (not marked as relevant). The final statement is still
> + needed to maintain the type correctness of the IR.
> +
> + vect_determine_precisions has already determined the minimum
> + precison of the operation and the minimum precision required
> + by users of the result. */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> ! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> ! if (!last_stmt)
> ! return NULL;
>
> ! /* See whether we have found that this operation can be done on a
> ! narrower type without changing its semantics. */
> ! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> ! unsigned int new_precision = last_stmt_info->operation_precision;
> ! if (!new_precision)
> ! return NULL;
>
> ! vec_info *vinfo = last_stmt_info->vinfo;
> ! tree lhs = gimple_assign_lhs (last_stmt);
> ! tree type = TREE_TYPE (lhs);
> ! tree_code code = gimple_assign_rhs_code (last_stmt);
> !
> ! /* Keep the first operand of a COND_EXPR as-is: only the other two
> ! operands are interesting. */
> ! unsigned int first_op = (code == COND_EXPR ? 2 : 1);
>
> ! /* Check the operands. */
> ! unsigned int nops = gimple_num_ops (last_stmt) - first_op;
> ! auto_vec <vect_unpromoted_value, 3> unprom (nops);
> ! unprom.quick_grow (nops);
> ! unsigned int min_precision = 0;
> ! bool single_use_p = false;
> ! for (unsigned int i = 0; i < nops; ++i)
> ! {
> ! tree op = gimple_op (last_stmt, first_op + i);
> ! if (TREE_CODE (op) == INTEGER_CST)
> ! unprom[i].set_op (op, vect_constant_def);
> ! else if (TREE_CODE (op) == SSA_NAME)
> ! {
> ! bool op_single_use_p = true;
> ! if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
> ! &op_single_use_p))
> ! return NULL;
> ! /* If:
>
> ! (1) N bits of the result are needed;
> ! (2) all inputs are widened from M<N bits; and
> ! (3) one operand OP is a single-use SSA name
> !
> ! we can shift the M->N widening from OP to the output
> ! without changing the number or type of extensions involved.
> ! This then reduces the number of copies of STMT_INFO.
> !
> ! If instead of (3) more than one operand is a single-use SSA name,
> ! shifting the extension to the output is even more of a win.
> !
> ! If instead:
> !
> ! (1) N bits of the result are needed;
> ! (2) one operand OP2 is widened from M2<N bits;
> ! (3) another operand OP1 is widened from M1<M2 bits; and
> ! (4) both OP1 and OP2 are single-use
> !
> ! the choice is between:
> !
> ! (a) truncating OP2 to M1, doing the operation on M1,
> ! and then widening the result to N
> !
> ! (b) widening OP1 to M2, doing the operation on M2, and then
> ! widening the result to N
> !
> ! Both shift the M2->N widening of the inputs to the output.
> ! (a) additionally shifts the M1->M2 widening to the output;
> ! it requires fewer copies of STMT_INFO but requires an extra
> ! M2->M1 truncation.
> !
> ! Which is better will depend on the complexity and cost of
> ! STMT_INFO, which is hard to predict at this stage. However,
> ! a clear tie-breaker in favor of (b) is the fact that the
> ! truncation in (a) increases the length of the operation chain.
> !
> ! If instead of (4) only one of OP1 or OP2 is single-use,
> ! (b) is still a win over doing the operation in N bits:
> ! it still shifts the M2->N widening on the single-use operand
> ! to the output and reduces the number of STMT_INFO copies.
> !
> ! If neither operand is single-use then operating on fewer than
> ! N bits might lead to more extensions overall. Whether it does
> ! or not depends on global information about the vectorization
> ! region, and whether that's a good trade-off would again
> ! depend on the complexity and cost of the statements involved,
> ! as well as things like register pressure that are not normally
> ! modelled at this stage. We therefore ignore these cases
> ! and just optimize the clear single-use wins above.
> !
> ! Thus we take the maximum precision of the unpromoted operands
> ! and record whether any operand is single-use. */
> ! if (unprom[i].dt == vect_internal_def)
> ! {
> ! min_precision = MAX (min_precision,
> ! TYPE_PRECISION (unprom[i].type));
> ! single_use_p |= op_single_use_p;
> ! }
> ! }
> ! }
>
> ! /* Although the operation could be done in operation_precision, we have
> ! to balance that against introducing extra truncations or extensions.
> ! Calculate the minimum precision that can be handled efficiently.
> !
> ! The loop above determined that the operation could be handled
> ! efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
> ! extension from the inputs to the output without introducing more
> ! instructions, and would reduce the number of instructions required
> ! for STMT_INFO itself.
> !
> ! vect_determine_precisions has also determined that the result only
> ! needs min_output_precision bits. Truncating by a factor of N times
> ! requires a tree of N - 1 instructions, so if TYPE is N times wider
> ! than min_output_precision, doing the operation in TYPE and truncating
> ! the result requires N + (N - 1) = 2N - 1 instructions per output vector.
> ! In contrast:
> !
> ! - truncating the input to a unary operation and doing the operation
> ! in the new type requires at most N - 1 + 1 = N instructions per
> ! output vector
> !
> ! - doing the same for a binary operation requires at most
> ! (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
> !
> ! Both unary and binary operations require fewer instructions than
> ! this if the operands were extended from a suitable truncated form.
> ! Thus there is usually nothing to lose by doing operations in
> ! min_output_precision bits, but there can be something to gain. */
> ! if (!single_use_p)
> ! min_precision = last_stmt_info->min_output_precision;
> ! else
> ! min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
>
> ! /* Apply the minimum efficient precision we just calculated. */
> ! if (new_precision < min_precision)
> ! new_precision = min_precision;
> ! if (new_precision >= TYPE_PRECISION (type))
> ! return NULL;
>
> ! vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
>
> ! *type_out = get_vectype_for_scalar_type (type);
> ! if (!*type_out)
> ! return NULL;
>
> ! /* We've found a viable pattern. Get the new type of the operation. */
> ! bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
> ! tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
> !
> ! /* We specifically don't check here whether the target supports the
> ! new operation, since it might be something that a later pattern
> ! wants to rewrite anyway. If targets have a minimum element size
> ! for some optabs, we should pattern-match smaller ops to larger ops
> ! where beneficial. */
> ! tree new_vectype = get_vectype_for_scalar_type (new_type);
> ! if (!new_vectype)
> ! return NULL;
>
> ! if (dump_enabled_p ())
> {
> ! dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
> ! dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
> ! dump_printf (MSG_NOTE, " to ");
> ! dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
> ! dump_printf (MSG_NOTE, "\n");
> }
>
> ! /* Calculate the rhs operands for an operation on NEW_TYPE. */
> ! STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
> ! tree ops[3] = {};
> ! for (unsigned int i = 1; i < first_op; ++i)
> ! ops[i - 1] = gimple_op (last_stmt, i);
> ! vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
> ! new_type, &unprom[0], new_vectype);
> !
> ! /* Use the operation to produce a result of type NEW_TYPE. */
> ! tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> ! gimple *pattern_stmt = gimple_build_assign (new_var, code,
> ! ops[0], ops[1], ops[2]);
> ! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> ! if (dump_enabled_p ())
> ! {
> ! dump_printf_loc (MSG_NOTE, vect_location,
> ! "created pattern stmt: ");
> ! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> }
>
> ! pattern_stmt = vect_convert_output (last_stmt_info, type,
> ! pattern_stmt, new_vectype);
>
> ! stmts->safe_push (last_stmt);
> ! return pattern_stmt;
> }
>
> + /* Recognize cases in which the input to a cast is wider than its
> + output, and the input is fed by a widening operation. Fold this
> + by removing the unnecessary intermediate widening. E.g.:
>
> ! unsigned char a;
> ! unsigned int b = (unsigned int) a;
> ! unsigned short c = (unsigned short) b;
>
> ! -->
>
> ! unsigned short c = (unsigned short) a;
>
> ! Although this is rare in input IR, it is an expected side-effect
> ! of the over-widening pattern above.
>
> ! This is beneficial also for integer-to-float conversions, if the
> ! widened integer has more bits than the float, and if the unwidened
> ! input doesn't. */
>
> ! static gimple *
> ! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> ! /* Check for a cast, including an integer-to-float conversion. */
> ! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> ! if (!last_stmt)
> ! return NULL;
> ! tree_code code = gimple_assign_rhs_code (last_stmt);
> ! if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
> ! return NULL;
>
> ! /* Make sure that the rhs is a scalar with a natural bitsize. */
> ! tree lhs = gimple_assign_lhs (last_stmt);
> ! if (!lhs)
> ! return NULL;
> ! tree lhs_type = TREE_TYPE (lhs);
> ! scalar_mode lhs_mode;
> ! if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
> ! || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
> ! return NULL;
>
> ! /* Check for a narrowing operation (from a vector point of view). */
> ! tree rhs = gimple_assign_rhs1 (last_stmt);
> ! tree rhs_type = TREE_TYPE (rhs);
> ! if (!INTEGRAL_TYPE_P (rhs_type)
> ! || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
> ! || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
> ! return NULL;
>
> ! /* Try to find an unpromoted input. */
> ! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> ! vec_info *vinfo = last_stmt_info->vinfo;
> ! vect_unpromoted_value unprom;
> ! if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
> ! || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
> ! return NULL;
>
> ! /* If the bits above RHS_TYPE matter, make sure that they're the
> ! same when extending from UNPROM as they are when extending from RHS. */
> ! if (!INTEGRAL_TYPE_P (lhs_type)
> ! && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
> ! return NULL;
>
> ! /* We can get the same result by casting UNPROM directly, to avoid
> ! the unnecessary widening and narrowing. */
> ! vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
>
> ! *type_out = get_vectype_for_scalar_type (lhs_type);
> ! if (!*type_out)
> return NULL;
>
> ! tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> ! gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
> ! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> + stmts->safe_push (last_stmt);
> return pattern_stmt;
> }
>
> *************** vect_recog_gather_scatter_pattern (vec<g
> *** 4205,4210 ****
> --- 4170,4559 ----
> return pattern_stmt;
> }
>
> + /* Return true if TYPE is a non-boolean integer type. These are the types
> + that we want to consider for narrowing. */
> +
> + static bool
> + vect_narrowable_type_p (tree type)
> + {
> + return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
> + }
> +
> + /* Return true if the operation given by CODE can be truncated to N bits
> + when only N bits of the output are needed. This is only true if bit N+1
> + of the inputs has no effect on the low N bits of the result. */
> +
> + static bool
> + vect_truncatable_operation_p (tree_code code)
> + {
> + switch (code)
> + {
> + case PLUS_EXPR:
> + case MINUS_EXPR:
> + case MULT_EXPR:
> + case BIT_AND_EXPR:
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + case COND_EXPR:
> + return true;
> +
> + default:
> + return false;
> + }
> + }
> +
> + /* Record that STMT_INFO could be changed from operating on TYPE to
> + operating on a type with the precision and sign given by PRECISION
> + and SIGN respectively. PRECISION is an arbitrary bit precision;
> + it might not be a whole number of bytes. */
> +
> + static void
> + vect_set_operation_type (stmt_vec_info stmt_info, tree type,
> + unsigned int precision, signop sign)
> + {
> + /* Round the precision up to a whole number of bytes. */
> + precision = vect_element_precision (precision);
> + if (precision < TYPE_PRECISION (type)
> + && (!stmt_info->operation_precision
> + || stmt_info->operation_precision > precision))
> + {
> + stmt_info->operation_precision = precision;
> + stmt_info->operation_sign = sign;
> + }
> + }
> +
> + /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
> + non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
> + is an arbitrary bit precision; it might not be a whole number of bytes. */
> +
> + static void
> + vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
> + unsigned int min_input_precision)
> + {
> + /* This operation in isolation only requires the inputs to have
> + MIN_INPUT_PRECISION of precision, However, that doesn't mean
> + that MIN_INPUT_PRECISION is a natural precision for the chain
> + as a whole. E.g. consider something like:
> +
> + unsigned short *x, *y;
> + *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> + The right shift can be done on unsigned chars, and only requires the
> + result of "*x & 0xf0" to be done on unsigned chars. But taking that
> + approach would mean turning a natural chain of single-vector unsigned
> + short operations into one that truncates "*x" and then extends
> + "(*x & 0xf0) >> 4", with two vectors for each unsigned short
> + operation and one vector for each unsigned char operation.
> + This would be a significant pessimization.
> +
> + Instead only propagate the maximum of this precision and the precision
> + required by the users of the result. This means that we don't pessimize
> + the case above but continue to optimize things like:
> +
> + unsigned char *y;
> + unsigned short *x;
> + *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> + Here we would truncate two vectors of *x to a single vector of
> + unsigned chars and use single-vector unsigned char operations for
> + everything else, rather than doing two unsigned short copies of
> + "(*x & 0xf0) >> 4" and then truncating the result. */
> + min_input_precision = MAX (min_input_precision,
> + stmt_info->min_output_precision);
> +
> + if (min_input_precision < TYPE_PRECISION (type)
> + && (!stmt_info->min_input_precision
> + || stmt_info->min_input_precision > min_input_precision))
> + stmt_info->min_input_precision = min_input_precision;
> + }
> +
> + /* Subroutine of vect_determine_min_output_precision. Return true if
> + we can calculate a reduced number of output bits for STMT_INFO,
> + whose result is LHS. */
> +
> + static bool
> + vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
> + {
> + /* Take the maximum precision required by users of the result. */
> + unsigned int precision = 0;
> + imm_use_iterator iter;
> + use_operand_p use;
> + FOR_EACH_IMM_USE_FAST (use, iter, lhs)
> + {
> + gimple *use_stmt = USE_STMT (use);
> + if (is_gimple_debug (use_stmt))
> + continue;
> + if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
> + return false;
> + stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
> + if (!use_stmt_info->min_input_precision)
> + return false;
> + precision = MAX (precision, use_stmt_info->min_input_precision);
> + }
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
> + precision);
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
> + dump_printf (MSG_NOTE, " are significant\n");
> + }
> + stmt_info->min_output_precision = precision;
> + return true;
> + }
> +
> + /* Calculate min_output_precision for STMT_INFO. */
> +
> + static void
> + vect_determine_min_output_precision (stmt_vec_info stmt_info)
> + {
> + /* We're only interested in statements with a narrowable result. */
> + tree lhs = gimple_get_lhs (stmt_info->stmt);
> + if (!lhs
> + || TREE_CODE (lhs) != SSA_NAME
> + || !vect_narrowable_type_p (TREE_TYPE (lhs)))
> + return;
> +
> + if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
> + stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
> + }
> +
> + /* Use range information to decide whether STMT (described by STMT_INFO)
> + could be done in a narrower type. This is effectively a forward
> + propagation, since it uses context-independent information that applies
> + to all users of an SSA name. */
> +
> + static void
> + vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
> + {
> + tree lhs = gimple_assign_lhs (stmt);
> + if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> + return;
> +
> + tree type = TREE_TYPE (lhs);
> + if (!vect_narrowable_type_p (type))
> + return;
> +
> + /* First see whether we have any useful range information for the result. */
> + unsigned int precision = TYPE_PRECISION (type);
> + signop sign = TYPE_SIGN (type);
> + wide_int min_value, max_value;
> + if (!vect_get_range_info (lhs, &min_value, &max_value))
> + return;
> +
> + tree_code code = gimple_assign_rhs_code (stmt);
> + unsigned int nops = gimple_num_ops (stmt);
> +
> + if (!vect_truncatable_operation_p (code))
> + /* Check that all relevant input operands are compatible, and update
> + [MIN_VALUE, MAX_VALUE] to include their ranges. */
> + for (unsigned int i = 1; i < nops; ++i)
> + {
> + tree op = gimple_op (stmt, i);
> + if (TREE_CODE (op) == INTEGER_CST)
> + {
> + /* Don't require the integer to have RHS_TYPE (which it might
> + not for things like shift amounts, etc.), but do require it
> + to fit the type. */
> + if (!int_fits_type_p (op, type))
> + return;
> +
> + min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
> + max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
> + }
> + else if (TREE_CODE (op) == SSA_NAME)
> + {
> + /* Ignore codes that don't take uniform arguments. */
> + if (!types_compatible_p (TREE_TYPE (op), type))
> + return;
> +
> + wide_int op_min_value, op_max_value;
> + if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> + return;
> +
> + min_value = wi::min (min_value, op_min_value, sign);
> + max_value = wi::max (max_value, op_max_value, sign);
> + }
> + else
> + return;
> + }
> +
> + /* Try to switch signed types for unsigned types if we can.
> + This is better for two reasons. First, unsigned ops tend
> + to be cheaper than signed ops. Second, it means that we can
> + handle things like:
> +
> + signed char c;
> + int res = (int) c & 0xff00; // range [0x0000, 0xff00]
> +
> + as:
> +
> + signed char c;
> + unsigned short res_1 = (unsigned short) c & 0xff00;
> + int res = (int) res_1;
> +
> + where the intermediate result res_1 has unsigned rather than
> + signed type. */
> + if (sign == SIGNED && !wi::neg_p (min_value))
> + sign = UNSIGNED;
> +
> + /* See what precision is required for MIN_VALUE and MAX_VALUE. */
> + unsigned int precision1 = wi::min_precision (min_value, sign);
> + unsigned int precision2 = wi::min_precision (max_value, sign);
> + unsigned int value_precision = MAX (precision1, precision2);
> + if (value_precision >= precision)
> + return;
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> + " without loss of precision: ",
> + sign == SIGNED ? "signed" : "unsigned",
> + value_precision);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + }
> +
> + vect_set_operation_type (stmt_info, type, value_precision, sign);
> + vect_set_min_input_precision (stmt_info, type, value_precision);
> + }
> +
> + /* Use information about the users of STMT's result to decide whether
> + STMT (described by STMT_INFO) could be done in a narrower type.
> + This is effectively a backward propagation. */
> +
> + static void
> + vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
> + {
> + tree_code code = gimple_assign_rhs_code (stmt);
> + unsigned int opno = (code == COND_EXPR ? 2 : 1);
> + tree type = TREE_TYPE (gimple_op (stmt, opno));
> + if (!vect_narrowable_type_p (type))
> + return;
> +
> + unsigned int precision = TYPE_PRECISION (type);
> + unsigned int operation_precision, min_input_precision;
> + switch (code)
> + {
> + CASE_CONVERT:
> + /* Only the bits that contribute to the output matter. Don't change
> + the precision of the operation itself. */
> + operation_precision = precision;
> + min_input_precision = stmt_info->min_output_precision;
> + break;
> +
> + case LSHIFT_EXPR:
> + case RSHIFT_EXPR:
> + {
> + tree shift = gimple_assign_rhs2 (stmt);
> + if (TREE_CODE (shift) != INTEGER_CST
> + || !wi::ltu_p (wi::to_widest (shift), precision))
> + return;
> + unsigned int const_shift = TREE_INT_CST_LOW (shift);
> + if (code == LSHIFT_EXPR)
> + {
> + /* We need CONST_SHIFT fewer bits of the input. */
> + operation_precision = stmt_info->min_output_precision;
> + min_input_precision = (MAX (operation_precision, const_shift)
> + - const_shift);
> + }
> + else
> + {
> + /* We need CONST_SHIFT extra bits to do the operation. */
> + operation_precision = (stmt_info->min_output_precision
> + + const_shift);
> + min_input_precision = operation_precision;
> + }
> + break;
> + }
> +
> + default:
> + if (vect_truncatable_operation_p (code))
> + {
> + /* Input bit N has no effect on output bits N-1 and lower. */
> + operation_precision = stmt_info->min_output_precision;
> + min_input_precision = operation_precision;
> + break;
> + }
> + return;
> + }
> +
> + if (operation_precision < precision)
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> + " without affecting users: ",
> + TYPE_UNSIGNED (type) ? "unsigned" : "signed",
> + operation_precision);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + }
> + vect_set_operation_type (stmt_info, type, operation_precision,
> + TYPE_SIGN (type));
> + }
> + vect_set_min_input_precision (stmt_info, type, min_input_precision);
> + }
> +
> + /* Handle vect_determine_precisions for STMT_INFO, given that we
> + have already done so for the users of its result. */
> +
> + void
> + vect_determine_stmt_precisions (stmt_vec_info stmt_info)
> + {
> + vect_determine_min_output_precision (stmt_info);
> + if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
> + {
> + vect_determine_precisions_from_range (stmt_info, stmt);
> + vect_determine_precisions_from_users (stmt_info, stmt);
> + }
> + }
> +
> + /* Walk backwards through the vectorizable region to determine the
> + values of these fields:
> +
> + - min_output_precision
> + - min_input_precision
> + - operation_precision
> + - operation_sign. */
> +
> + void
> + vect_determine_precisions (vec_info *vinfo)
> + {
> + DUMP_VECT_SCOPE ("vect_determine_precisions");
> +
> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> + {
> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> + unsigned int nbbs = loop->num_nodes;
> +
> + for (unsigned int i = 0; i < nbbs; i++)
> + {
> + basic_block bb = bbs[nbbs - i - 1];
> + for (gimple_stmt_iterator si = gsi_last_bb (bb);
> + !gsi_end_p (si); gsi_prev (&si))
> + vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
> + }
> + }
> + else
> + {
> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> + gimple_stmt_iterator si = bb_vinfo->region_end;
> + gimple *stmt;
> + do
> + {
> + if (!gsi_stmt (si))
> + si = gsi_last_bb (bb_vinfo->bb);
> + else
> + gsi_prev (&si);
> + stmt = gsi_stmt (si);
> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> + if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
> + vect_determine_stmt_precisions (stmt_info);
> + }
> + while (stmt != gsi_stmt (bb_vinfo->region_begin));
> + }
> + }
> +
> typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
>
> struct vect_recog_func
> *************** struct vect_recog_func
> *** 4217,4229 ****
> taken which means usually the more complex one needs to preceed the
> less comples onex (widen_sum only after dot_prod or sad for example). */
> static vect_recog_func vect_vect_recog_func_ptrs[] = {
> { vect_recog_widen_mult_pattern, "widen_mult" },
> { vect_recog_dot_prod_pattern, "dot_prod" },
> { vect_recog_sad_pattern, "sad" },
> { vect_recog_widen_sum_pattern, "widen_sum" },
> { vect_recog_pow_pattern, "pow" },
> { vect_recog_widen_shift_pattern, "widen_shift" },
> - { vect_recog_over_widening_pattern, "over_widening" },
> { vect_recog_rotate_pattern, "rotate" },
> { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> { vect_recog_divmod_pattern, "divmod" },
> --- 4566,4579 ----
> taken which means usually the more complex one needs to preceed the
> less comples onex (widen_sum only after dot_prod or sad for example). */
> static vect_recog_func vect_vect_recog_func_ptrs[] = {
> + { vect_recog_over_widening_pattern, "over_widening" },
> + { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
> { vect_recog_widen_mult_pattern, "widen_mult" },
> { vect_recog_dot_prod_pattern, "dot_prod" },
> { vect_recog_sad_pattern, "sad" },
> { vect_recog_widen_sum_pattern, "widen_sum" },
> { vect_recog_pow_pattern, "pow" },
> { vect_recog_widen_shift_pattern, "widen_shift" },
> { vect_recog_rotate_pattern, "rotate" },
> { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> { vect_recog_divmod_pattern, "divmod" },
> *************** vect_pattern_recog (vec_info *vinfo)
> *** 4497,4502 ****
> --- 4847,4854 ----
> unsigned int i, j;
> auto_vec<gimple *, 1> stmts_to_replace;
>
> + vect_determine_precisions (vinfo);
> +
> DUMP_VECT_SCOPE ("vect_pattern_recog");
>
> if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 62,69 ****
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 62,70 ----
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 58,64 ****
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 58,66 ----
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
> return 0;
> }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment. */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
> return 0;
> }
>
> ! /* This is an over-widening even though the final result is still an int.
> ! It's better to do one vector of ops on chars and then widen than to
> ! widen and then do 4 vectors of ops on ints. */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
> return 0;
> }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment. */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
> return 0;
> }
>
> ! /* This is an over-widening even though the final result is still an int.
> ! It's better to do one vector of ops on chars and then widen than to
> ! widen and then do 4 vectors of ops on ints. */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,62 ****
> return 0;
> }
>
> ! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,65 ----
> return 0;
> }
>
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> ===================================================
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-06-29 12:56 ` Richard Sandiford
2018-07-02 11:02 ` Christophe Lyon
@ 2018-07-02 13:12 ` Richard Biener
2018-07-03 10:02 ` Richard Sandiford
1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2018-07-02 13:12 UTC (permalink / raw)
To: GCC Patches, richard.sandiford
On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > This patch is the main part of PR85694. The aim is to recognise at least:
> >
> > signed char *a, *b, *c;
> > ...
> > for (int i = 0; i < 2048; i++)
> > c[i] = (a[i] + b[i]) >> 1;
> >
> > as an over-widening pattern, since the addition and shift can be done
> > on shorts rather than ints. However, it ended up being a lot more
> > general than that.
> >
> > The current over-widening pattern detection is limited to a few simple
> > cases: logical ops with immediate second operands, and shifts by a
> > constant. These cases are enough for common pixel-format conversion
> > and can be detected in a peephole way.
> >
> > The loop above requires two generalisations of the current code: support
> > for addition as well as logical ops, and support for non-constant second
> > operands. These are harder to detect in the same peephole way, so the
> > patch tries to take a more global approach.
> >
> > The idea is to get information about the minimum operation width
> > in two ways:
> >
> > (1) by using the range information attached to the SSA_NAMEs
> > (effectively a forward walk, since the range info is
> > context-independent).
> >
> > (2) by back-propagating the number of output bits required by
> > users of the result.
> >
> > As explained in the comments, there's a balance to be struck between
> > narrowing an individual operation and fitting in with the surrounding
> > code. The approach is pretty conservative: if we could narrow an
> > operation to N bits without changing its semantics, it's OK to do that if:
> >
> > - no operations later in the chain require more than N bits; or
> >
> > - all internally-defined inputs are extended from N bits or fewer,
> > and at least one of them is single-use.
> >
> > See the comments for the rationale.
> >
> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> > since the code seemed more readable without.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>
> Here's a version rebased on top of current trunk. Changes from last time:
>
> - reintroduce dump_generic_expr_loc, with the obvious change to the
> prototype
>
> - fix a typo in a comment
>
> - use vect_element_precision from the new version of 12/n.
>
> Tested as before. OK to install?
OK.
Richard.
> Richard
>
>
> 2018-06-29 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * poly-int.h (print_hex): New function.
> * dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
> * dumpfile.c (dump_generic_expr): Fix formatting.
> (dump_generic_expr_loc): New function.
> (dump_dec, dump_hex): New poly_wide_int functions.
> * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
> min_input_precision, operation_precision and operation_sign.
> * tree-vect-patterns.c (vect_get_range_info): New function.
> (vect_same_loop_or_bb_p, vect_single_imm_use)
> (vect_operation_fits_smaller_type): Delete.
> (vect_look_through_possible_promotion): Add an optional
> single_use_p parameter.
> (vect_recog_over_widening_pattern): Rewrite to use new
> stmt_vec_info infomration. Handle one operation at a time.
> (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
> (vect_truncatable_operation_p, vect_set_operation_type)
> (vect_set_min_input_precision): New functions.
> (vect_determine_min_output_precision_1): Likewise.
> (vect_determine_min_output_precision): Likewise.
> (vect_determine_precisions_from_range): Likewise.
> (vect_determine_precisions_from_users): Likewise.
> (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
> (vect_vect_recog_func_ptrs): Put over_widening first.
> Add cast_forwprop.
> (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
> over-widening messages.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
> * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-5.c: Likewise.
> * gcc.dg/vect/vect-over-widen-6.c: Likewise.
> * gcc.dg/vect/vect-over-widen-7.c: Likewise.
> * gcc.dg/vect/vect-over-widen-8.c: Likewise.
> * gcc.dg/vect/vect-over-widen-9.c: Likewise.
> * gcc.dg/vect/vect-over-widen-10.c: Likewise.
> * gcc.dg/vect/vect-over-widen-11.c: Likewise.
> * gcc.dg/vect/vect-over-widen-12.c: Likewise.
> * gcc.dg/vect/vect-over-widen-13.c: Likewise.
> * gcc.dg/vect/vect-over-widen-14.c: Likewise.
> * gcc.dg/vect/vect-over-widen-15.c: Likewise.
> * gcc.dg/vect/vect-over-widen-16.c: Likewise.
> * gcc.dg/vect/vect-over-widen-17.c: Likewise.
> * gcc.dg/vect/vect-over-widen-18.c: Likewise.
> * gcc.dg/vect/vect-over-widen-19.c: Likewise.
> * gcc.dg/vect/vect-over-widen-20.c: Likewise.
> * gcc.dg/vect/vect-over-widen-21.c: Likewise.
>
> Index: gcc/poly-int.h
> ===================================================================
> *** gcc/poly-int.h 2018-06-29 12:33:06.000000000 +0100
> --- gcc/poly-int.h 2018-06-29 12:33:06.721263572 +0100
> *************** print_dec (const poly_int_pod<N, C> &val
> *** 2420,2425 ****
> --- 2420,2444 ----
> poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
> }
>
> + /* Use print_hex to print VALUE to FILE. */
> +
> + template<unsigned int N, typename C>
> + void
> + print_hex (const poly_int_pod<N, C> &value, FILE *file)
> + {
> + if (value.is_constant ())
> + print_hex (value.coeffs[0], file);
> + else
> + {
> + fprintf (file, "[");
> + for (unsigned int i = 0; i < N; ++i)
> + {
> + print_hex (value.coeffs[i], file);
> + fputc (i == N - 1 ? ']' : ',', file);
> + }
> + }
> + }
> +
> /* Helper for calculating the distance between two points P1 and P2,
> in cases where known_le (P1, P2). T1 and T2 are the types of the
> two positions, in either order. The coefficients of P2 - P1 have
> Index: gcc/dumpfile.h
> ===================================================================
> *** gcc/dumpfile.h 2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.h 2018-06-29 12:33:06.717263602 +0100
> *************** extern void dump_printf_loc (dump_flags_
> *** 425,430 ****
> --- 425,432 ----
> const char *, ...) ATTRIBUTE_PRINTF_3;
> extern void dump_function (int phase, tree fn);
> extern void dump_basic_block (dump_flags_t, basic_block, int);
> + extern void dump_generic_expr_loc (dump_flags_t, const dump_location_t &,
> + dump_flags_t, tree);
> extern void dump_generic_expr (dump_flags_t, dump_flags_t, tree);
> extern void dump_gimple_stmt_loc (dump_flags_t, const dump_location_t &,
> dump_flags_t, gimple *, int);
> *************** extern bool enable_rtl_dump_file (void);
> *** 434,439 ****
> --- 436,443 ----
>
> template<unsigned int N, typename C>
> void dump_dec (dump_flags_t, const poly_int<N, C> &);
> + extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
> + extern void dump_hex (dump_flags_t, const poly_wide_int &);
>
> /* In tree-dump.c */
> extern void dump_node (const_tree, dump_flags_t, FILE *);
> Index: gcc/dumpfile.c
> ===================================================================
> *** gcc/dumpfile.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/dumpfile.c 2018-06-29 12:33:06.717263602 +0100
> *************** dump_generic_expr (dump_flags_t dump_kin
> *** 498,507 ****
> --- 498,527 ----
> tree t)
> {
> if (dump_file && (dump_kind & pflags))
> + print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> + }
> +
> + /* Similar to dump_generic_expr, except additionally print source location. */
> +
> + void
> + dump_generic_expr_loc (dump_flags_t dump_kind, const dump_location_t &loc,
> + dump_flags_t extra_dump_flags, tree t)
> + {
> + location_t srcloc = loc.get_location_t ();
> + if (dump_file && (dump_kind & pflags))
> + {
> + dump_loc (dump_kind, dump_file, srcloc);
> print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> + }
>
> if (alt_dump_file && (dump_kind & alt_flags))
> + {
> + dump_loc (dump_kind, alt_dump_file, srcloc);
> print_generic_expr (alt_dump_file, t, dump_flags | extra_dump_flags);
> + }
> }
>
> /* Output a formatted message using FORMAT on appropriate dump streams. */
> *************** template void dump_dec (dump_flags_t, co
> *** 573,578 ****
> --- 593,620 ----
> template void dump_dec (dump_flags_t, const poly_offset_int &);
> template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> + void
> + dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
> + {
> + if (dump_file && (dump_kind & pflags))
> + print_dec (value, dump_file, sgn);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_dec (value, alt_dump_file, sgn);
> + }
> +
> + /* Output VALUE in hexadecimal to appropriate dump streams. */
> +
> + void
> + dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
> + {
> + if (dump_file && (dump_kind & pflags))
> + print_hex (value, dump_file);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_hex (value, alt_dump_file);
> + }
> +
> /* Start a dump for PHASE. Store user-supplied dump flags in
> *FLAG_PTR. Return the number of streams opened. Set globals
> DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h 2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vectorizer.h 2018-06-29 12:33:06.725263540 +0100
> *************** typedef struct _stmt_vec_info {
> *** 899,904 ****
> --- 899,919 ----
>
> /* The number of scalar stmt references from active SLP instances. */
> unsigned int num_slp_uses;
> +
> + /* If nonzero, the lhs of the statement could be truncated to this
> + many bits without affecting any users of the result. */
> + unsigned int min_output_precision;
> +
> + /* If nonzero, all non-boolean input operands have the same precision,
> + and they could each be truncated to this many bits without changing
> + the result. */
> + unsigned int min_input_precision;
> +
> + /* If OPERATION_BITS is nonzero, the statement could be performed on
> + an integer with the sign and number of bits given by OPERATION_SIGN
> + and OPERATION_BITS without changing the result. */
> + unsigned int operation_precision;
> + signop operation_sign;
> } *stmt_vec_info;
>
> /* Information about a gather/scatter call. */
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> *** gcc/tree-vect-patterns.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/tree-vect-patterns.c 2018-06-29 12:33:06.721263572 +0100
> *************** Software Foundation; either version 3, o
> *** 47,52 ****
> --- 47,86 ----
> #include "omp-simd-clone.h"
> #include "predict.h"
>
> + /* Return true if we have a useful VR_RANGE range for VAR, storing it
> + in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
> +
> + static bool
> + vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> + {
> + value_range_type vr_type = get_range_info (var, min_value, max_value);
> + wide_int nonzero = get_nonzero_bits (var);
> + signop sgn = TYPE_SIGN (TREE_TYPE (var));
> + if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
> + nonzero, sgn) == VR_RANGE)
> + {
> + if (dump_enabled_p ())
> + {
> + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> + dump_printf (MSG_NOTE, " has range [");
> + dump_hex (MSG_NOTE, *min_value);
> + dump_printf (MSG_NOTE, ", ");
> + dump_hex (MSG_NOTE, *max_value);
> + dump_printf (MSG_NOTE, "]\n");
> + }
> + return true;
> + }
> + else
> + {
> + if (dump_enabled_p ())
> + {
> + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> + dump_printf (MSG_NOTE, " has no range info\n");
> + }
> + return false;
> + }
> + }
> +
> /* Report that we've found an instance of pattern PATTERN in
> statement STMT. */
>
> *************** vect_supportable_direct_optab_p (tree ot
> *** 190,229 ****
> return true;
> }
>
> - /* Check whether STMT2 is in the same loop or basic block as STMT1.
> - Which of the two applies depends on whether we're currently doing
> - loop-based or basic-block-based vectorization, as determined by
> - the vinfo_for_stmt for STMT1 (which must be defined).
> -
> - If this returns true, vinfo_for_stmt for STMT2 is guaranteed
> - to be defined as well. */
> -
> - static bool
> - vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> - {
> - stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> - return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> - }
> -
> - /* If the LHS of DEF_STMT has a single use, and that statement is
> - in the same loop or basic block, return it. */
> -
> - static gimple *
> - vect_single_imm_use (gimple *def_stmt)
> - {
> - tree lhs = gimple_assign_lhs (def_stmt);
> - use_operand_p use_p;
> - gimple *use_stmt;
> -
> - if (!single_imm_use (lhs, &use_p, &use_stmt))
> - return NULL;
> -
> - if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
> - return NULL;
> -
> - return use_stmt;
> - }
> -
> /* Round bit precision PRECISION up to a full element. */
>
> static unsigned int
> --- 224,229 ----
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 347,353 ****
> is possible to convert OP' back to OP using a possible sign change
> followed by a possible promotion P. Return this OP', or null if OP is
> not a vectorizable SSA name. If there is a promotion P, describe its
> ! input in UNPROM, otherwise describe OP' in UNPROM.
>
> A successful return means that it is possible to go from OP' to OP
> via UNPROM. The cast from OP' to UNPROM is at most a sign change,
> --- 347,355 ----
> is possible to convert OP' back to OP using a possible sign change
> followed by a possible promotion P. Return this OP', or null if OP is
> not a vectorizable SSA name. If there is a promotion P, describe its
> ! input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
> ! is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
> ! have more than one user.
>
> A successful return means that it is possible to go from OP' to OP
> via UNPROM. The cast from OP' to UNPROM is at most a sign change,
> *************** vect_unpromoted_value::set_op (tree op_i
> *** 374,380 ****
>
> static tree
> vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> ! vect_unpromoted_value *unprom)
> {
> tree res = NULL_TREE;
> tree op_type = TREE_TYPE (op);
> --- 376,383 ----
>
> static tree
> vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> ! vect_unpromoted_value *unprom,
> ! bool *single_use_p = NULL)
> {
> tree res = NULL_TREE;
> tree op_type = TREE_TYPE (op);
> *************** vect_look_through_possible_promotion (ve
> *** 420,426 ****
> if (!def_stmt)
> break;
> if (dt == vect_internal_def)
> ! caster = vinfo_for_stmt (def_stmt);
> else
> caster = NULL;
> gassign *assign = dyn_cast <gassign *> (def_stmt);
> --- 423,436 ----
> if (!def_stmt)
> break;
> if (dt == vect_internal_def)
> ! {
> ! caster = vinfo_for_stmt (def_stmt);
> ! /* Ignore pattern statements, since we don't link uses for them. */
> ! if (single_use_p
> ! && !STMT_VINFO_RELATED_STMT (caster)
> ! && !has_single_use (res))
> ! *single_use_p = false;
> ! }
> else
> caster = NULL;
> gassign *assign = dyn_cast <gassign *> (def_stmt);
> *************** vect_recog_widen_sum_pattern (vec<gimple
> *** 1371,1733 ****
> return pattern_stmt;
> }
>
>
> ! /* Return TRUE if the operation in STMT can be performed on a smaller type.
>
> ! Input:
> ! STMT - a statement to check.
> ! DEF - we support operations with two operands, one of which is constant.
> ! The other operand can be defined by a demotion operation, or by a
> ! previous statement in a sequence of over-promoted operations. In the
> ! later case DEF is used to replace that operand. (It is defined by a
> ! pattern statement we created for the previous statement in the
> ! sequence).
> !
> ! Input/output:
> ! NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
> ! NULL, it's the type of DEF.
> ! STMTS - additional pattern statements. If a pattern statement (type
> ! conversion) is created in this function, its original statement is
> ! added to STMTS.
>
> ! Output:
> ! OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
> ! operands to use in the new pattern statement for STMT (will be created
> ! in vect_recog_over_widening_pattern ()).
> ! NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
> ! statements for STMT: the first one is a type promotion and the second
> ! one is the operation itself. We return the type promotion statement
> ! in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
> ! the second pattern statement. */
>
> ! static bool
> ! vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
> ! tree *op0, tree *op1, gimple **new_def_stmt,
> ! vec<gimple *> *stmts)
> ! {
> ! enum tree_code code;
> ! tree const_oprnd, oprnd;
> ! tree interm_type = NULL_TREE, half_type, new_oprnd, type;
> ! gimple *def_stmt, *new_stmt;
> ! bool first = false;
> ! bool promotion;
>
> ! *op0 = NULL_TREE;
> ! *op1 = NULL_TREE;
> ! *new_def_stmt = NULL;
>
> ! if (!is_gimple_assign (stmt))
> ! return false;
>
> ! code = gimple_assign_rhs_code (stmt);
> ! if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
> ! && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
> ! return false;
>
> ! oprnd = gimple_assign_rhs1 (stmt);
> ! const_oprnd = gimple_assign_rhs2 (stmt);
> ! type = gimple_expr_type (stmt);
>
> ! if (TREE_CODE (oprnd) != SSA_NAME
> ! || TREE_CODE (const_oprnd) != INTEGER_CST)
> ! return false;
>
> ! /* If oprnd has other uses besides that in stmt we cannot mark it
> ! as being part of a pattern only. */
> ! if (!has_single_use (oprnd))
> ! return false;
>
> ! /* If we are in the middle of a sequence, we use DEF from a previous
> ! statement. Otherwise, OPRND has to be a result of type promotion. */
> ! if (*new_type)
> ! {
> ! half_type = *new_type;
> ! oprnd = def;
> ! }
> ! else
> {
> ! first = true;
> ! if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
> ! &promotion)
> ! || !promotion
> ! || !vect_same_loop_or_bb_p (stmt, def_stmt))
> ! return false;
> }
>
> ! /* Can we perform the operation on a smaller type? */
> ! switch (code)
> ! {
> ! case BIT_IOR_EXPR:
> ! case BIT_XOR_EXPR:
> ! case BIT_AND_EXPR:
> ! if (!int_fits_type_p (const_oprnd, half_type))
> ! {
> ! /* HALF_TYPE is not enough. Try a bigger type if possible. */
> ! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> ! return false;
> !
> ! interm_type = build_nonstandard_integer_type (
> ! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> ! if (!int_fits_type_p (const_oprnd, interm_type))
> ! return false;
> ! }
> !
> ! break;
> !
> ! case LSHIFT_EXPR:
> ! /* Try intermediate type - HALF_TYPE is not enough for sure. */
> ! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> ! return false;
> !
> ! /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
> ! (e.g., if the original value was char, the shift amount is at most 8
> ! if we want to use short). */
> ! if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
> ! return false;
> !
> ! interm_type = build_nonstandard_integer_type (
> ! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> ! if (!vect_supportable_shift (code, interm_type))
> ! return false;
> !
> ! break;
> !
> ! case RSHIFT_EXPR:
> ! if (vect_supportable_shift (code, half_type))
> ! break;
> !
> ! /* Try intermediate type - HALF_TYPE is not supported. */
> ! if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> ! return false;
> !
> ! interm_type = build_nonstandard_integer_type (
> ! TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> !
> ! if (!vect_supportable_shift (code, interm_type))
> ! return false;
> !
> ! break;
> !
> ! default:
> ! gcc_unreachable ();
> ! }
> !
> ! /* There are four possible cases:
> ! 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
> ! the first statement in the sequence)
> ! a. The original, HALF_TYPE, is not enough - we replace the promotion
> ! from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
> ! b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
> ! promotion.
> ! 2. OPRND is defined by a pattern statement we created.
> ! a. Its type is not sufficient for the operation, we create a new stmt:
> ! a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
> ! this statement in NEW_DEF_STMT, and it is later put in
> ! STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
> ! b. OPRND is good to use in the new statement. */
> ! if (first)
> ! {
> ! if (interm_type)
> ! {
> ! /* Replace the original type conversion HALF_TYPE->TYPE with
> ! HALF_TYPE->INTERM_TYPE. */
> ! if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
> ! {
> ! new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
> ! /* Check if the already created pattern stmt is what we need. */
> ! if (!is_gimple_assign (new_stmt)
> ! || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
> ! || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
> ! return false;
> !
> ! stmts->safe_push (def_stmt);
> ! oprnd = gimple_assign_lhs (new_stmt);
> ! }
> ! else
> ! {
> ! /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
> ! oprnd = gimple_assign_rhs1 (def_stmt);
> ! new_oprnd = make_ssa_name (interm_type);
> ! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> ! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
> ! stmts->safe_push (def_stmt);
> ! oprnd = new_oprnd;
> ! }
> ! }
> ! else
> ! {
> ! /* Retrieve the operand before the type promotion. */
> ! oprnd = gimple_assign_rhs1 (def_stmt);
> ! }
> ! }
> ! else
> ! {
> ! if (interm_type)
> ! {
> ! /* Create a type conversion HALF_TYPE->INTERM_TYPE. */
> ! new_oprnd = make_ssa_name (interm_type);
> ! new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> ! oprnd = new_oprnd;
> ! *new_def_stmt = new_stmt;
> ! }
>
> ! /* Otherwise, OPRND is already set. */
> }
>
> ! if (interm_type)
> ! *new_type = interm_type;
> ! else
> ! *new_type = half_type;
>
> ! *op0 = oprnd;
> ! *op1 = fold_convert (*new_type, const_oprnd);
> !
> ! return true;
> }
>
>
> ! /* Try to find a statement or a sequence of statements that can be performed
> ! on a smaller type:
>
> ! type x_t;
> ! TYPE x_T, res0_T, res1_T;
> ! loop:
> ! S1 x_t = *p;
> ! S2 x_T = (TYPE) x_t;
> ! S3 res0_T = op (x_T, C0);
> ! S4 res1_T = op (res0_T, C1);
> ! S5 ... = () res1_T; - type demotion
> !
> ! where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
> ! constants.
> ! Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
> ! be 'type' or some intermediate type. For now, we expect S5 to be a type
> ! demotion operation. We also check that S3 and S4 have only one use. */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> ! gimple *stmt = stmts->pop ();
> ! gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
> ! *use_stmt = NULL;
> ! tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
> ! tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
> ! bool first;
> ! tree type = NULL;
> !
> ! first = true;
> ! while (1)
> ! {
> ! if (!vinfo_for_stmt (stmt)
> ! || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
> ! return NULL;
> !
> ! new_def_stmt = NULL;
> ! if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
> ! &op0, &op1, &new_def_stmt,
> ! stmts))
> ! {
> ! if (first)
> ! return NULL;
> ! else
> ! break;
> ! }
>
> ! /* STMT can be performed on a smaller type. Check its uses. */
> ! use_stmt = vect_single_imm_use (stmt);
> ! if (!use_stmt || !is_gimple_assign (use_stmt))
> ! return NULL;
> !
> ! /* Create pattern statement for STMT. */
> ! vectype = get_vectype_for_scalar_type (new_type);
> ! if (!vectype)
> ! return NULL;
> !
> ! /* We want to collect all the statements for which we create pattern
> ! statetments, except for the case when the last statement in the
> ! sequence doesn't have a corresponding pattern statement. In such
> ! case we associate the last pattern statement with the last statement
> ! in the sequence. Therefore, we only add the original statement to
> ! the list if we know that it is not the last. */
> ! if (prev_stmt)
> ! stmts->safe_push (prev_stmt);
>
> ! var = vect_recog_temp_ssa_var (new_type, NULL);
> ! pattern_stmt
> ! = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
> ! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
> ! new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
>
> ! if (dump_enabled_p ())
> ! {
> ! dump_printf_loc (MSG_NOTE, vect_location,
> ! "created pattern stmt: ");
> ! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> ! }
>
> ! type = gimple_expr_type (stmt);
> ! prev_stmt = stmt;
> ! stmt = use_stmt;
> !
> ! first = false;
> ! }
> !
> ! /* We got a sequence. We expect it to end with a type demotion operation.
> ! Otherwise, we quit (for now). There are three possible cases: the
> ! conversion is to NEW_TYPE (we don't do anything), the conversion is to
> ! a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
> ! NEW_TYPE differs (we create a new conversion statement). */
> ! if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
> ! {
> ! use_lhs = gimple_assign_lhs (use_stmt);
> ! use_type = TREE_TYPE (use_lhs);
> ! /* Support only type demotion or signedess change. */
> ! if (!INTEGRAL_TYPE_P (use_type)
> ! || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
> ! return NULL;
>
> ! /* Check that NEW_TYPE is not bigger than the conversion result. */
> ! if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
> ! return NULL;
>
> ! if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
> ! || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
> ! {
> ! *type_out = get_vectype_for_scalar_type (use_type);
> ! if (!*type_out)
> ! return NULL;
>
> ! /* Create NEW_TYPE->USE_TYPE conversion. */
> ! new_oprnd = make_ssa_name (use_type);
> ! pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
> ! STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
> !
> ! /* We created a pattern statement for the last statement in the
> ! sequence, so we don't need to associate it with the pattern
> ! statement created for PREV_STMT. Therefore, we add PREV_STMT
> ! to the list in order to mark it later in vect_pattern_recog_1. */
> ! if (prev_stmt)
> ! stmts->safe_push (prev_stmt);
> ! }
> ! else
> ! {
> ! if (prev_stmt)
> ! STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
> ! = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
>
> ! *type_out = vectype;
> ! }
>
> ! stmts->safe_push (use_stmt);
> ! }
> ! else
> ! /* TODO: support general case, create a conversion to the correct type. */
> return NULL;
>
> ! /* Pattern detected. */
> ! vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
>
> return pattern_stmt;
> }
>
> --- 1381,1698 ----
> return pattern_stmt;
> }
>
> + /* Recognize cases in which an operation is performed in one type WTYPE
> + but could be done more efficiently in a narrower type NTYPE. For example,
> + if we have:
> +
> + ATYPE a; // narrower than NTYPE
> + BTYPE b; // narrower than NTYPE
> + WTYPE aw = (WTYPE) a;
> + WTYPE bw = (WTYPE) b;
> + WTYPE res = aw + bw; // only uses of aw and bw
> +
> + then it would be more efficient to do:
> +
> + NTYPE an = (NTYPE) a;
> + NTYPE bn = (NTYPE) b;
> + NTYPE resn = an + bn;
> + WTYPE res = (WTYPE) resn;
> +
> + Other situations include things like:
> +
> + ATYPE a; // NTYPE or narrower
> + WTYPE aw = (WTYPE) a;
> + WTYPE res = aw + b;
> +
> + when only "(NTYPE) res" is significant. In that case it's more efficient
> + to truncate "b" and do the operation on NTYPE instead:
> +
> + NTYPE an = (NTYPE) a;
> + NTYPE bn = (NTYPE) b; // truncation
> + NTYPE resn = an + bn;
> + WTYPE res = (WTYPE) resn;
> +
> + All users of "res" should then use "resn" instead, making the final
> + statement dead (not marked as relevant). The final statement is still
> + needed to maintain the type correctness of the IR.
> +
> + vect_determine_precisions has already determined the minimum
> + precison of the operation and the minimum precision required
> + by users of the result. */
>
> ! static gimple *
> ! vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> ! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> ! if (!last_stmt)
> ! return NULL;
>
> ! /* See whether we have found that this operation can be done on a
> ! narrower type without changing its semantics. */
> ! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> ! unsigned int new_precision = last_stmt_info->operation_precision;
> ! if (!new_precision)
> ! return NULL;
>
> ! vec_info *vinfo = last_stmt_info->vinfo;
> ! tree lhs = gimple_assign_lhs (last_stmt);
> ! tree type = TREE_TYPE (lhs);
> ! tree_code code = gimple_assign_rhs_code (last_stmt);
> !
> ! /* Keep the first operand of a COND_EXPR as-is: only the other two
> ! operands are interesting. */
> ! unsigned int first_op = (code == COND_EXPR ? 2 : 1);
>
> ! /* Check the operands. */
> ! unsigned int nops = gimple_num_ops (last_stmt) - first_op;
> ! auto_vec <vect_unpromoted_value, 3> unprom (nops);
> ! unprom.quick_grow (nops);
> ! unsigned int min_precision = 0;
> ! bool single_use_p = false;
> ! for (unsigned int i = 0; i < nops; ++i)
> ! {
> ! tree op = gimple_op (last_stmt, first_op + i);
> ! if (TREE_CODE (op) == INTEGER_CST)
> ! unprom[i].set_op (op, vect_constant_def);
> ! else if (TREE_CODE (op) == SSA_NAME)
> ! {
> ! bool op_single_use_p = true;
> ! if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
> ! &op_single_use_p))
> ! return NULL;
> ! /* If:
>
> ! (1) N bits of the result are needed;
> ! (2) all inputs are widened from M<N bits; and
> ! (3) one operand OP is a single-use SSA name
> !
> ! we can shift the M->N widening from OP to the output
> ! without changing the number or type of extensions involved.
> ! This then reduces the number of copies of STMT_INFO.
> !
> ! If instead of (3) more than one operand is a single-use SSA name,
> ! shifting the extension to the output is even more of a win.
> !
> ! If instead:
> !
> ! (1) N bits of the result are needed;
> ! (2) one operand OP2 is widened from M2<N bits;
> ! (3) another operand OP1 is widened from M1<M2 bits; and
> ! (4) both OP1 and OP2 are single-use
> !
> ! the choice is between:
> !
> ! (a) truncating OP2 to M1, doing the operation on M1,
> ! and then widening the result to N
> !
> ! (b) widening OP1 to M2, doing the operation on M2, and then
> ! widening the result to N
> !
> ! Both shift the M2->N widening of the inputs to the output.
> ! (a) additionally shifts the M1->M2 widening to the output;
> ! it requires fewer copies of STMT_INFO but requires an extra
> ! M2->M1 truncation.
> !
> ! Which is better will depend on the complexity and cost of
> ! STMT_INFO, which is hard to predict at this stage. However,
> ! a clear tie-breaker in favor of (b) is the fact that the
> ! truncation in (a) increases the length of the operation chain.
> !
> ! If instead of (4) only one of OP1 or OP2 is single-use,
> ! (b) is still a win over doing the operation in N bits:
> ! it still shifts the M2->N widening on the single-use operand
> ! to the output and reduces the number of STMT_INFO copies.
> !
> ! If neither operand is single-use then operating on fewer than
> ! N bits might lead to more extensions overall. Whether it does
> ! or not depends on global information about the vectorization
> ! region, and whether that's a good trade-off would again
> ! depend on the complexity and cost of the statements involved,
> ! as well as things like register pressure that are not normally
> ! modelled at this stage. We therefore ignore these cases
> ! and just optimize the clear single-use wins above.
> !
> ! Thus we take the maximum precision of the unpromoted operands
> ! and record whether any operand is single-use. */
> ! if (unprom[i].dt == vect_internal_def)
> ! {
> ! min_precision = MAX (min_precision,
> ! TYPE_PRECISION (unprom[i].type));
> ! single_use_p |= op_single_use_p;
> ! }
> ! }
> ! }
>
> ! /* Although the operation could be done in operation_precision, we have
> ! to balance that against introducing extra truncations or extensions.
> ! Calculate the minimum precision that can be handled efficiently.
> !
> ! The loop above determined that the operation could be handled
> ! efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
> ! extension from the inputs to the output without introducing more
> ! instructions, and would reduce the number of instructions required
> ! for STMT_INFO itself.
> !
> ! vect_determine_precisions has also determined that the result only
> ! needs min_output_precision bits. Truncating by a factor of N times
> ! requires a tree of N - 1 instructions, so if TYPE is N times wider
> ! than min_output_precision, doing the operation in TYPE and truncating
> ! the result requires N + (N - 1) = 2N - 1 instructions per output vector.
> ! In contrast:
> !
> ! - truncating the input to a unary operation and doing the operation
> ! in the new type requires at most N - 1 + 1 = N instructions per
> ! output vector
> !
> ! - doing the same for a binary operation requires at most
> ! (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
> !
> ! Both unary and binary operations require fewer instructions than
> ! this if the operands were extended from a suitable truncated form.
> ! Thus there is usually nothing to lose by doing operations in
> ! min_output_precision bits, but there can be something to gain. */
> ! if (!single_use_p)
> ! min_precision = last_stmt_info->min_output_precision;
> ! else
> ! min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
>
> ! /* Apply the minimum efficient precision we just calculated. */
> ! if (new_precision < min_precision)
> ! new_precision = min_precision;
> ! if (new_precision >= TYPE_PRECISION (type))
> ! return NULL;
>
> ! vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
>
> ! *type_out = get_vectype_for_scalar_type (type);
> ! if (!*type_out)
> ! return NULL;
>
> ! /* We've found a viable pattern. Get the new type of the operation. */
> ! bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
> ! tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
> !
> ! /* We specifically don't check here whether the target supports the
> ! new operation, since it might be something that a later pattern
> ! wants to rewrite anyway. If targets have a minimum element size
> ! for some optabs, we should pattern-match smaller ops to larger ops
> ! where beneficial. */
> ! tree new_vectype = get_vectype_for_scalar_type (new_type);
> ! if (!new_vectype)
> ! return NULL;
>
> ! if (dump_enabled_p ())
> {
> ! dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
> ! dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
> ! dump_printf (MSG_NOTE, " to ");
> ! dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
> ! dump_printf (MSG_NOTE, "\n");
> }
>
> ! /* Calculate the rhs operands for an operation on NEW_TYPE. */
> ! STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
> ! tree ops[3] = {};
> ! for (unsigned int i = 1; i < first_op; ++i)
> ! ops[i - 1] = gimple_op (last_stmt, i);
> ! vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
> ! new_type, &unprom[0], new_vectype);
> !
> ! /* Use the operation to produce a result of type NEW_TYPE. */
> ! tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> ! gimple *pattern_stmt = gimple_build_assign (new_var, code,
> ! ops[0], ops[1], ops[2]);
> ! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> ! if (dump_enabled_p ())
> ! {
> ! dump_printf_loc (MSG_NOTE, vect_location,
> ! "created pattern stmt: ");
> ! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> }
>
> ! pattern_stmt = vect_convert_output (last_stmt_info, type,
> ! pattern_stmt, new_vectype);
>
> ! stmts->safe_push (last_stmt);
> ! return pattern_stmt;
> }
>
> + /* Recognize cases in which the input to a cast is wider than its
> + output, and the input is fed by a widening operation. Fold this
> + by removing the unnecessary intermediate widening. E.g.:
>
> ! unsigned char a;
> ! unsigned int b = (unsigned int) a;
> ! unsigned short c = (unsigned short) b;
>
> ! -->
>
> ! unsigned short c = (unsigned short) a;
>
> ! Although this is rare in input IR, it is an expected side-effect
> ! of the over-widening pattern above.
>
> ! This is beneficial also for integer-to-float conversions, if the
> ! widened integer has more bits than the float, and if the unwidened
> ! input doesn't. */
>
> ! static gimple *
> ! vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
> ! {
> ! /* Check for a cast, including an integer-to-float conversion. */
> ! gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> ! if (!last_stmt)
> ! return NULL;
> ! tree_code code = gimple_assign_rhs_code (last_stmt);
> ! if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
> ! return NULL;
>
> ! /* Make sure that the rhs is a scalar with a natural bitsize. */
> ! tree lhs = gimple_assign_lhs (last_stmt);
> ! if (!lhs)
> ! return NULL;
> ! tree lhs_type = TREE_TYPE (lhs);
> ! scalar_mode lhs_mode;
> ! if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
> ! || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
> ! return NULL;
>
> ! /* Check for a narrowing operation (from a vector point of view). */
> ! tree rhs = gimple_assign_rhs1 (last_stmt);
> ! tree rhs_type = TREE_TYPE (rhs);
> ! if (!INTEGRAL_TYPE_P (rhs_type)
> ! || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
> ! || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
> ! return NULL;
>
> ! /* Try to find an unpromoted input. */
> ! stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> ! vec_info *vinfo = last_stmt_info->vinfo;
> ! vect_unpromoted_value unprom;
> ! if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
> ! || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
> ! return NULL;
>
> ! /* If the bits above RHS_TYPE matter, make sure that they're the
> ! same when extending from UNPROM as they are when extending from RHS. */
> ! if (!INTEGRAL_TYPE_P (lhs_type)
> ! && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
> ! return NULL;
>
> ! /* We can get the same result by casting UNPROM directly, to avoid
> ! the unnecessary widening and narrowing. */
> ! vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
>
> ! *type_out = get_vectype_for_scalar_type (lhs_type);
> ! if (!*type_out)
> return NULL;
>
> ! tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> ! gimple *pattern_stmt = gimple_build_assign (new_var, NOP_EXPR, unprom.op);
> ! gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> + stmts->safe_push (last_stmt);
> return pattern_stmt;
> }
>
> *************** vect_recog_gather_scatter_pattern (vec<g
> *** 4205,4210 ****
> --- 4170,4559 ----
> return pattern_stmt;
> }
>
> + /* Return true if TYPE is a non-boolean integer type. These are the types
> + that we want to consider for narrowing. */
> +
> + static bool
> + vect_narrowable_type_p (tree type)
> + {
> + return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
> + }
> +
> + /* Return true if the operation given by CODE can be truncated to N bits
> + when only N bits of the output are needed. This is only true if bit N+1
> + of the inputs has no effect on the low N bits of the result. */
> +
> + static bool
> + vect_truncatable_operation_p (tree_code code)
> + {
> + switch (code)
> + {
> + case PLUS_EXPR:
> + case MINUS_EXPR:
> + case MULT_EXPR:
> + case BIT_AND_EXPR:
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + case COND_EXPR:
> + return true;
> +
> + default:
> + return false;
> + }
> + }
> +
> + /* Record that STMT_INFO could be changed from operating on TYPE to
> + operating on a type with the precision and sign given by PRECISION
> + and SIGN respectively. PRECISION is an arbitrary bit precision;
> + it might not be a whole number of bytes. */
> +
> + static void
> + vect_set_operation_type (stmt_vec_info stmt_info, tree type,
> + unsigned int precision, signop sign)
> + {
> + /* Round the precision up to a whole number of bytes. */
> + precision = vect_element_precision (precision);
> + if (precision < TYPE_PRECISION (type)
> + && (!stmt_info->operation_precision
> + || stmt_info->operation_precision > precision))
> + {
> + stmt_info->operation_precision = precision;
> + stmt_info->operation_sign = sign;
> + }
> + }
> +
> + /* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
> + non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
> + is an arbitrary bit precision; it might not be a whole number of bytes. */
> +
> + static void
> + vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
> + unsigned int min_input_precision)
> + {
> + /* This operation in isolation only requires the inputs to have
> + MIN_INPUT_PRECISION of precision, However, that doesn't mean
> + that MIN_INPUT_PRECISION is a natural precision for the chain
> + as a whole. E.g. consider something like:
> +
> + unsigned short *x, *y;
> + *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> + The right shift can be done on unsigned chars, and only requires the
> + result of "*x & 0xf0" to be done on unsigned chars. But taking that
> + approach would mean turning a natural chain of single-vector unsigned
> + short operations into one that truncates "*x" and then extends
> + "(*x & 0xf0) >> 4", with two vectors for each unsigned short
> + operation and one vector for each unsigned char operation.
> + This would be a significant pessimization.
> +
> + Instead only propagate the maximum of this precision and the precision
> + required by the users of the result. This means that we don't pessimize
> + the case above but continue to optimize things like:
> +
> + unsigned char *y;
> + unsigned short *x;
> + *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> + Here we would truncate two vectors of *x to a single vector of
> + unsigned chars and use single-vector unsigned char operations for
> + everything else, rather than doing two unsigned short copies of
> + "(*x & 0xf0) >> 4" and then truncating the result. */
> + min_input_precision = MAX (min_input_precision,
> + stmt_info->min_output_precision);
> +
> + if (min_input_precision < TYPE_PRECISION (type)
> + && (!stmt_info->min_input_precision
> + || stmt_info->min_input_precision > min_input_precision))
> + stmt_info->min_input_precision = min_input_precision;
> + }
> +
> + /* Subroutine of vect_determine_min_output_precision. Return true if
> + we can calculate a reduced number of output bits for STMT_INFO,
> + whose result is LHS. */
> +
> + static bool
> + vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
> + {
> + /* Take the maximum precision required by users of the result. */
> + unsigned int precision = 0;
> + imm_use_iterator iter;
> + use_operand_p use;
> + FOR_EACH_IMM_USE_FAST (use, iter, lhs)
> + {
> + gimple *use_stmt = USE_STMT (use);
> + if (is_gimple_debug (use_stmt))
> + continue;
> + if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
> + return false;
> + stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
> + if (!use_stmt_info->min_input_precision)
> + return false;
> + precision = MAX (precision, use_stmt_info->min_input_precision);
> + }
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
> + precision);
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
> + dump_printf (MSG_NOTE, " are significant\n");
> + }
> + stmt_info->min_output_precision = precision;
> + return true;
> + }
> +
> + /* Calculate min_output_precision for STMT_INFO. */
> +
> + static void
> + vect_determine_min_output_precision (stmt_vec_info stmt_info)
> + {
> + /* We're only interested in statements with a narrowable result. */
> + tree lhs = gimple_get_lhs (stmt_info->stmt);
> + if (!lhs
> + || TREE_CODE (lhs) != SSA_NAME
> + || !vect_narrowable_type_p (TREE_TYPE (lhs)))
> + return;
> +
> + if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
> + stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
> + }
> +
> + /* Use range information to decide whether STMT (described by STMT_INFO)
> + could be done in a narrower type. This is effectively a forward
> + propagation, since it uses context-independent information that applies
> + to all users of an SSA name. */
> +
> + static void
> + vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
> + {
> + tree lhs = gimple_assign_lhs (stmt);
> + if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> + return;
> +
> + tree type = TREE_TYPE (lhs);
> + if (!vect_narrowable_type_p (type))
> + return;
> +
> + /* First see whether we have any useful range information for the result. */
> + unsigned int precision = TYPE_PRECISION (type);
> + signop sign = TYPE_SIGN (type);
> + wide_int min_value, max_value;
> + if (!vect_get_range_info (lhs, &min_value, &max_value))
> + return;
> +
> + tree_code code = gimple_assign_rhs_code (stmt);
> + unsigned int nops = gimple_num_ops (stmt);
> +
> + if (!vect_truncatable_operation_p (code))
> + /* Check that all relevant input operands are compatible, and update
> + [MIN_VALUE, MAX_VALUE] to include their ranges. */
> + for (unsigned int i = 1; i < nops; ++i)
> + {
> + tree op = gimple_op (stmt, i);
> + if (TREE_CODE (op) == INTEGER_CST)
> + {
> + /* Don't require the integer to have RHS_TYPE (which it might
> + not for things like shift amounts, etc.), but do require it
> + to fit the type. */
> + if (!int_fits_type_p (op, type))
> + return;
> +
> + min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
> + max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
> + }
> + else if (TREE_CODE (op) == SSA_NAME)
> + {
> + /* Ignore codes that don't take uniform arguments. */
> + if (!types_compatible_p (TREE_TYPE (op), type))
> + return;
> +
> + wide_int op_min_value, op_max_value;
> + if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> + return;
> +
> + min_value = wi::min (min_value, op_min_value, sign);
> + max_value = wi::max (max_value, op_max_value, sign);
> + }
> + else
> + return;
> + }
> +
> + /* Try to switch signed types for unsigned types if we can.
> + This is better for two reasons. First, unsigned ops tend
> + to be cheaper than signed ops. Second, it means that we can
> + handle things like:
> +
> + signed char c;
> + int res = (int) c & 0xff00; // range [0x0000, 0xff00]
> +
> + as:
> +
> + signed char c;
> + unsigned short res_1 = (unsigned short) c & 0xff00;
> + int res = (int) res_1;
> +
> + where the intermediate result res_1 has unsigned rather than
> + signed type. */
> + if (sign == SIGNED && !wi::neg_p (min_value))
> + sign = UNSIGNED;
> +
> + /* See what precision is required for MIN_VALUE and MAX_VALUE. */
> + unsigned int precision1 = wi::min_precision (min_value, sign);
> + unsigned int precision2 = wi::min_precision (max_value, sign);
> + unsigned int value_precision = MAX (precision1, precision2);
> + if (value_precision >= precision)
> + return;
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> + " without loss of precision: ",
> + sign == SIGNED ? "signed" : "unsigned",
> + value_precision);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + }
> +
> + vect_set_operation_type (stmt_info, type, value_precision, sign);
> + vect_set_min_input_precision (stmt_info, type, value_precision);
> + }
> +
> + /* Use information about the users of STMT's result to decide whether
> + STMT (described by STMT_INFO) could be done in a narrower type.
> + This is effectively a backward propagation. */
> +
> + static void
> + vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
> + {
> + tree_code code = gimple_assign_rhs_code (stmt);
> + unsigned int opno = (code == COND_EXPR ? 2 : 1);
> + tree type = TREE_TYPE (gimple_op (stmt, opno));
> + if (!vect_narrowable_type_p (type))
> + return;
> +
> + unsigned int precision = TYPE_PRECISION (type);
> + unsigned int operation_precision, min_input_precision;
> + switch (code)
> + {
> + CASE_CONVERT:
> + /* Only the bits that contribute to the output matter. Don't change
> + the precision of the operation itself. */
> + operation_precision = precision;
> + min_input_precision = stmt_info->min_output_precision;
> + break;
> +
> + case LSHIFT_EXPR:
> + case RSHIFT_EXPR:
> + {
> + tree shift = gimple_assign_rhs2 (stmt);
> + if (TREE_CODE (shift) != INTEGER_CST
> + || !wi::ltu_p (wi::to_widest (shift), precision))
> + return;
> + unsigned int const_shift = TREE_INT_CST_LOW (shift);
> + if (code == LSHIFT_EXPR)
> + {
> + /* We need CONST_SHIFT fewer bits of the input. */
> + operation_precision = stmt_info->min_output_precision;
> + min_input_precision = (MAX (operation_precision, const_shift)
> + - const_shift);
> + }
> + else
> + {
> + /* We need CONST_SHIFT extra bits to do the operation. */
> + operation_precision = (stmt_info->min_output_precision
> + + const_shift);
> + min_input_precision = operation_precision;
> + }
> + break;
> + }
> +
> + default:
> + if (vect_truncatable_operation_p (code))
> + {
> + /* Input bit N has no effect on output bits N-1 and lower. */
> + operation_precision = stmt_info->min_output_precision;
> + min_input_precision = operation_precision;
> + break;
> + }
> + return;
> + }
> +
> + if (operation_precision < precision)
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> + " without affecting users: ",
> + TYPE_UNSIGNED (type) ? "unsigned" : "signed",
> + operation_precision);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + }
> + vect_set_operation_type (stmt_info, type, operation_precision,
> + TYPE_SIGN (type));
> + }
> + vect_set_min_input_precision (stmt_info, type, min_input_precision);
> + }
> +
> + /* Handle vect_determine_precisions for STMT_INFO, given that we
> + have already done so for the users of its result. */
> +
> + void
> + vect_determine_stmt_precisions (stmt_vec_info stmt_info)
> + {
> + vect_determine_min_output_precision (stmt_info);
> + if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
> + {
> + vect_determine_precisions_from_range (stmt_info, stmt);
> + vect_determine_precisions_from_users (stmt_info, stmt);
> + }
> + }
> +
> + /* Walk backwards through the vectorizable region to determine the
> + values of these fields:
> +
> + - min_output_precision
> + - min_input_precision
> + - operation_precision
> + - operation_sign. */
> +
> + void
> + vect_determine_precisions (vec_info *vinfo)
> + {
> + DUMP_VECT_SCOPE ("vect_determine_precisions");
> +
> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> + {
> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> + unsigned int nbbs = loop->num_nodes;
> +
> + for (unsigned int i = 0; i < nbbs; i++)
> + {
> + basic_block bb = bbs[nbbs - i - 1];
> + for (gimple_stmt_iterator si = gsi_last_bb (bb);
> + !gsi_end_p (si); gsi_prev (&si))
> + vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
> + }
> + }
> + else
> + {
> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> + gimple_stmt_iterator si = bb_vinfo->region_end;
> + gimple *stmt;
> + do
> + {
> + if (!gsi_stmt (si))
> + si = gsi_last_bb (bb_vinfo->bb);
> + else
> + gsi_prev (&si);
> + stmt = gsi_stmt (si);
> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> + if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
> + vect_determine_stmt_precisions (stmt_info);
> + }
> + while (stmt != gsi_stmt (bb_vinfo->region_begin));
> + }
> + }
> +
> typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
>
> struct vect_recog_func
> *************** struct vect_recog_func
> *** 4217,4229 ****
> taken which means usually the more complex one needs to preceed the
> less comples onex (widen_sum only after dot_prod or sad for example). */
> static vect_recog_func vect_vect_recog_func_ptrs[] = {
> { vect_recog_widen_mult_pattern, "widen_mult" },
> { vect_recog_dot_prod_pattern, "dot_prod" },
> { vect_recog_sad_pattern, "sad" },
> { vect_recog_widen_sum_pattern, "widen_sum" },
> { vect_recog_pow_pattern, "pow" },
> { vect_recog_widen_shift_pattern, "widen_shift" },
> - { vect_recog_over_widening_pattern, "over_widening" },
> { vect_recog_rotate_pattern, "rotate" },
> { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> { vect_recog_divmod_pattern, "divmod" },
> --- 4566,4579 ----
> taken which means usually the more complex one needs to preceed the
> less comples onex (widen_sum only after dot_prod or sad for example). */
> static vect_recog_func vect_vect_recog_func_ptrs[] = {
> + { vect_recog_over_widening_pattern, "over_widening" },
> + { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
> { vect_recog_widen_mult_pattern, "widen_mult" },
> { vect_recog_dot_prod_pattern, "dot_prod" },
> { vect_recog_sad_pattern, "sad" },
> { vect_recog_widen_sum_pattern, "widen_sum" },
> { vect_recog_pow_pattern, "pow" },
> { vect_recog_widen_shift_pattern, "widen_shift" },
> { vect_recog_rotate_pattern, "rotate" },
> { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> { vect_recog_divmod_pattern, "divmod" },
> *************** vect_pattern_recog (vec_info *vinfo)
> *** 4497,4502 ****
> --- 4847,4854 ----
> unsigned int i, j;
> auto_vec<gimple *, 1> stmts_to_replace;
>
> + vect_determine_precisions (vinfo);
> +
> DUMP_VECT_SCOPE ("vect_pattern_recog");
>
> if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 62,69 ****
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 62,70 ----
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 58,64 ****
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 58,66 ----
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
> return 0;
> }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment. */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
> return 0;
> }
>
> ! /* This is an over-widening even though the final result is still an int.
> ! It's better to do one vector of ops on chars and then widen than to
> ! widen and then do 4 vectors of ops on ints. */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,63 ****
> return 0;
> }
>
> ! /* Final value stays in int, so no over-widening is detected at the moment. */
> ! /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,68 ----
> return 0;
> }
>
> ! /* This is an over-widening even though the final result is still an int.
> ! It's better to do one vector of ops on chars and then widen than to
> ! widen and then do 4 vectors of ops on ints. */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-29 12:33:06.000000000 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-06-29 12:33:06.721263572 +0100
> *************** int main (void)
> *** 57,62 ****
> return 0;
> }
>
> ! /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --- 57,65 ----
> return 0;
> }
>
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> ! /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> ===================================================
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-07-02 11:02 ` Christophe Lyon
@ 2018-07-02 13:37 ` Richard Sandiford
2018-07-02 13:52 ` Christophe Lyon
0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2018-07-02 13:37 UTC (permalink / raw)
To: Christophe Lyon; +Cc: gcc Patches
Christophe Lyon <christophe.lyon@linaro.org> writes:
> On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > This patch is the main part of PR85694. The aim is to recognise at least:
>> >
>> > signed char *a, *b, *c;
>> > ...
>> > for (int i = 0; i < 2048; i++)
>> > c[i] = (a[i] + b[i]) >> 1;
>> >
>> > as an over-widening pattern, since the addition and shift can be done
>> > on shorts rather than ints. However, it ended up being a lot more
>> > general than that.
>> >
>> > The current over-widening pattern detection is limited to a few simple
>> > cases: logical ops with immediate second operands, and shifts by a
>> > constant. These cases are enough for common pixel-format conversion
>> > and can be detected in a peephole way.
>> >
>> > The loop above requires two generalisations of the current code: support
>> > for addition as well as logical ops, and support for non-constant second
>> > operands. These are harder to detect in the same peephole way, so the
>> > patch tries to take a more global approach.
>> >
>> > The idea is to get information about the minimum operation width
>> > in two ways:
>> >
>> > (1) by using the range information attached to the SSA_NAMEs
>> > (effectively a forward walk, since the range info is
>> > context-independent).
>> >
>> > (2) by back-propagating the number of output bits required by
>> > users of the result.
>> >
>> > As explained in the comments, there's a balance to be struck between
>> > narrowing an individual operation and fitting in with the surrounding
>> > code. The approach is pretty conservative: if we could narrow an
>> > operation to N bits without changing its semantics, it's OK to do that if:
>> >
>> > - no operations later in the chain require more than N bits; or
>> >
>> > - all internally-defined inputs are extended from N bits or fewer,
>> > and at least one of them is single-use.
>> >
>> > See the comments for the rationale.
>> >
>> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> > since the code seemed more readable without.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>
>> Here's a version rebased on top of current trunk. Changes from last time:
>>
>> - reintroduce dump_generic_expr_loc, with the obvious change to the
>> prototype
>>
>> - fix a typo in a comment
>>
>> - use vect_element_precision from the new version of 12/n.
>>
>> Tested as before. OK to install?
>>
>
> Hi Richard,
>
> This patch introduces regressions on arm-none-linux-gnueabihf:
> gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
> gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 8
> gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
Sorry about that, was caused by a stupid typo. I've applied the
below as obvious.
(For the record, it was actually 12/n that caused this. 14/n hasn't
been applied yet.)
Thanks,
Richard
2018-07-02 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo
in dump string.
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c 2018-07-02 14:30:57.000000000 +0100
+++ gcc/tree-vect-patterns.c 2018-07-02 14:30:57.383750450 +0100
@@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp
{
return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR,
WIDEN_LSHIFT_EXPR, true,
- "vect_widen_shift_pattern");
+ "vect_recog_widen_shift_pattern");
}
/* Detect a rotate pattern wouldn't be otherwise vectorized:
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-07-02 13:37 ` Richard Sandiford
@ 2018-07-02 13:52 ` Christophe Lyon
0 siblings, 0 replies; 10+ messages in thread
From: Christophe Lyon @ 2018-07-02 13:52 UTC (permalink / raw)
To: gcc Patches, Richard Sandiford
On Mon, 2 Jul 2018 at 15:37, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Christophe Lyon <christophe.lyon@linaro.org> writes:
> > On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> > This patch is the main part of PR85694. The aim is to recognise at least:
> >> >
> >> > signed char *a, *b, *c;
> >> > ...
> >> > for (int i = 0; i < 2048; i++)
> >> > c[i] = (a[i] + b[i]) >> 1;
> >> >
> >> > as an over-widening pattern, since the addition and shift can be done
> >> > on shorts rather than ints. However, it ended up being a lot more
> >> > general than that.
> >> >
> >> > The current over-widening pattern detection is limited to a few simple
> >> > cases: logical ops with immediate second operands, and shifts by a
> >> > constant. These cases are enough for common pixel-format conversion
> >> > and can be detected in a peephole way.
> >> >
> >> > The loop above requires two generalisations of the current code: support
> >> > for addition as well as logical ops, and support for non-constant second
> >> > operands. These are harder to detect in the same peephole way, so the
> >> > patch tries to take a more global approach.
> >> >
> >> > The idea is to get information about the minimum operation width
> >> > in two ways:
> >> >
> >> > (1) by using the range information attached to the SSA_NAMEs
> >> > (effectively a forward walk, since the range info is
> >> > context-independent).
> >> >
> >> > (2) by back-propagating the number of output bits required by
> >> > users of the result.
> >> >
> >> > As explained in the comments, there's a balance to be struck between
> >> > narrowing an individual operation and fitting in with the surrounding
> >> > code. The approach is pretty conservative: if we could narrow an
> >> > operation to N bits without changing its semantics, it's OK to do that if:
> >> >
> >> > - no operations later in the chain require more than N bits; or
> >> >
> >> > - all internally-defined inputs are extended from N bits or fewer,
> >> > and at least one of them is single-use.
> >> >
> >> > See the comments for the rationale.
> >> >
> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> >> > since the code seemed more readable without.
> >> >
> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
> >>
> >> Here's a version rebased on top of current trunk. Changes from last time:
> >>
> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
> >> prototype
> >>
> >> - fix a typo in a comment
> >>
> >> - use vect_element_precision from the new version of 12/n.
> >>
> >> Tested as before. OK to install?
> >>
> >
> > Hi Richard,
> >
> > This patch introduces regressions on arm-none-linux-gnueabihf:
> > gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> > vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> > vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
> > gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 8
> > gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
>
> Sorry about that, was caused by a stupid typo. I've applied the
> below as obvious.
>
> (For the record, it was actually 12/n that caused this. 14/n hasn't
> been applied yet.)
>
Sorry about the confusion, I probably messed up in gmail when
searching for the mail containing the patch that caused the
regression.
> Thanks,
> Richard
>
>
> 2018-07-02 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo
> in dump string.
>
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c 2018-07-02 14:30:57.000000000 +0100
> +++ gcc/tree-vect-patterns.c 2018-07-02 14:30:57.383750450 +0100
> @@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp
> {
> return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR,
> WIDEN_LSHIFT_EXPR, true,
> - "vect_widen_shift_pattern");
> + "vect_recog_widen_shift_pattern");
> }
>
> /* Detect a rotate pattern wouldn't be otherwise vectorized:
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-07-02 13:12 ` Richard Biener
@ 2018-07-03 10:02 ` Richard Sandiford
2018-07-03 20:08 ` Christophe Lyon
0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2018-07-03 10:02 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > This patch is the main part of PR85694. The aim is to recognise at least:
>> >
>> > signed char *a, *b, *c;
>> > ...
>> > for (int i = 0; i < 2048; i++)
>> > c[i] = (a[i] + b[i]) >> 1;
>> >
>> > as an over-widening pattern, since the addition and shift can be done
>> > on shorts rather than ints. However, it ended up being a lot more
>> > general than that.
>> >
>> > The current over-widening pattern detection is limited to a few simple
>> > cases: logical ops with immediate second operands, and shifts by a
>> > constant. These cases are enough for common pixel-format conversion
>> > and can be detected in a peephole way.
>> >
>> > The loop above requires two generalisations of the current code: support
>> > for addition as well as logical ops, and support for non-constant second
>> > operands. These are harder to detect in the same peephole way, so the
>> > patch tries to take a more global approach.
>> >
>> > The idea is to get information about the minimum operation width
>> > in two ways:
>> >
>> > (1) by using the range information attached to the SSA_NAMEs
>> > (effectively a forward walk, since the range info is
>> > context-independent).
>> >
>> > (2) by back-propagating the number of output bits required by
>> > users of the result.
>> >
>> > As explained in the comments, there's a balance to be struck between
>> > narrowing an individual operation and fitting in with the surrounding
>> > code. The approach is pretty conservative: if we could narrow an
>> > operation to N bits without changing its semantics, it's OK to do that if:
>> >
>> > - no operations later in the chain require more than N bits; or
>> >
>> > - all internally-defined inputs are extended from N bits or fewer,
>> > and at least one of them is single-use.
>> >
>> > See the comments for the rationale.
>> >
>> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> > since the code seemed more readable without.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>
>> Here's a version rebased on top of current trunk. Changes from last time:
>>
>> - reintroduce dump_generic_expr_loc, with the obvious change to the
>> prototype
>>
>> - fix a typo in a comment
>>
>> - use vect_element_precision from the new version of 12/n.
>>
>> Tested as before. OK to install?
>
> OK.
Thanks. For the record, here's what I installed (updated on top of
Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).
Richard
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_dec, dump_hex): Declare.
* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration. Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.
gcc/testsuite/
* gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
widen_mult pattern.
* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
over-widening messages.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-5.c: Likewise.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* gcc.dg/vect/vect-over-widen-16.c: Likewise.
* gcc.dg/vect/vect-over-widen-17.c: Likewise.
* gcc.dg/vect/vect-over-widen-18.c: Likewise.
* gcc.dg/vect/vect-over-widen-19.c: Likewise.
* gcc.dg/vect/vect-over-widen-20.c: Likewise.
* gcc.dg/vect/vect-over-widen-21.c: Likewise.
------------------------------------------------------------------------------
Index: gcc/poly-int.h
===================================================================
--- gcc/poly-int.h 2018-07-03 09:01:31.075962445 +0100
+++ gcc/poly-int.h 2018-07-03 09:02:36.563413564 +0100
@@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &val
poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
}
+/* Use print_hex to print VALUE to FILE. */
+
+template<unsigned int N, typename C>
+void
+print_hex (const poly_int_pod<N, C> &value, FILE *file)
+{
+ if (value.is_constant ())
+ print_hex (value.coeffs[0], file);
+ else
+ {
+ fprintf (file, "[");
+ for (unsigned int i = 0; i < N; ++i)
+ {
+ print_hex (value.coeffs[i], file);
+ fputc (i == N - 1 ? ']' : ',', file);
+ }
+ }
+}
+
/* Helper for calculating the distance between two points P1 and P2,
in cases where known_le (P1, P2). T1 and T2 are the types of the
two positions, in either order. The coefficients of P2 - P1 have
Index: gcc/dumpfile.h
===================================================================
--- gcc/dumpfile.h 2018-07-02 14:30:09.280175397 +0100
+++ gcc/dumpfile.h 2018-07-03 09:02:36.563413564 +0100
@@ -436,6 +436,8 @@ extern bool enable_rtl_dump_file (void);
template<unsigned int N, typename C>
void dump_dec (dump_flags_t, const poly_int<N, C> &);
+extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
+extern void dump_hex (dump_flags_t, const poly_wide_int &);
/* In tree-dump.c */
extern void dump_node (const_tree, dump_flags_t, FILE *);
Index: gcc/dumpfile.c
===================================================================
--- gcc/dumpfile.c 2018-07-03 09:01:31.071962478 +0100
+++ gcc/dumpfile.c 2018-07-03 09:02:36.563413564 +0100
@@ -597,6 +597,28 @@ template void dump_dec (dump_flags_t, co
template void dump_dec (dump_flags_t, const poly_offset_int &);
template void dump_dec (dump_flags_t, const poly_widest_int &);
+void
+dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
+{
+ if (dump_file && (dump_kind & pflags))
+ print_dec (value, dump_file, sgn);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_dec (value, alt_dump_file, sgn);
+}
+
+/* Output VALUE in hexadecimal to appropriate dump streams. */
+
+void
+dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
+{
+ if (dump_file && (dump_kind & pflags))
+ print_hex (value, dump_file);
+
+ if (alt_dump_file && (dump_kind & alt_flags))
+ print_hex (value, alt_dump_file);
+}
+
/* The current dump scope-nesting depth. */
static int dump_scope_depth;
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h 2018-07-03 09:01:31.079962411 +0100
+++ gcc/tree-vectorizer.h 2018-07-03 09:02:36.567413531 +0100
@@ -899,6 +899,21 @@ typedef struct _stmt_vec_info {
/* The number of scalar stmt references from active SLP instances. */
unsigned int num_slp_uses;
+
+ /* If nonzero, the lhs of the statement could be truncated to this
+ many bits without affecting any users of the result. */
+ unsigned int min_output_precision;
+
+ /* If nonzero, all non-boolean input operands have the same precision,
+ and they could each be truncated to this many bits without changing
+ the result. */
+ unsigned int min_input_precision;
+
+ /* If OPERATION_BITS is nonzero, the statement could be performed on
+ an integer with the sign and number of bits given by OPERATION_SIGN
+ and OPERATION_BITS without changing the result. */
+ unsigned int operation_precision;
+ signop operation_sign;
} *stmt_vec_info;
/* Information about a gather/scatter call. */
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c 2018-07-03 09:01:31.035962780 +0100
+++ gcc/tree-vect-patterns.c 2018-07-03 09:02:36.567413531 +0100
@@ -47,6 +47,40 @@ Software Foundation; either version 3, o
#include "omp-simd-clone.h"
#include "predict.h"
+/* Return true if we have a useful VR_RANGE range for VAR, storing it
+ in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
+
+static bool
+vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
+{
+ value_range_type vr_type = get_range_info (var, min_value, max_value);
+ wide_int nonzero = get_nonzero_bits (var);
+ signop sgn = TYPE_SIGN (TREE_TYPE (var));
+ if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
+ nonzero, sgn) == VR_RANGE)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ dump_printf (MSG_NOTE, " has range [");
+ dump_hex (MSG_NOTE, *min_value);
+ dump_printf (MSG_NOTE, ", ");
+ dump_hex (MSG_NOTE, *max_value);
+ dump_printf (MSG_NOTE, "]\n");
+ }
+ return true;
+ }
+ else
+ {
+ if (dump_enabled_p ())
+ {
+ dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
+ dump_printf (MSG_NOTE, " has no range info\n");
+ }
+ return false;
+ }
+}
+
/* Report that we've found an instance of pattern PATTERN in
statement STMT. */
@@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree ot
return true;
}
-/* Check whether STMT2 is in the same loop or basic block as STMT1.
- Which of the two applies depends on whether we're currently doing
- loop-based or basic-block-based vectorization, as determined by
- the vinfo_for_stmt for STMT1 (which must be defined).
-
- If this returns true, vinfo_for_stmt for STMT2 is guaranteed
- to be defined as well. */
-
-static bool
-vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
-{
- stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
- return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
-}
-
-/* If the LHS of DEF_STMT has a single use, and that statement is
- in the same loop or basic block, return it. */
-
-static gimple *
-vect_single_imm_use (gimple *def_stmt)
-{
- tree lhs = gimple_assign_lhs (def_stmt);
- use_operand_p use_p;
- gimple *use_stmt;
-
- if (!single_imm_use (lhs, &use_p, &use_stmt))
- return NULL;
-
- if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
- return NULL;
-
- return use_stmt;
-}
-
/* Round bit precision PRECISION up to a full element. */
static unsigned int
@@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_i
is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its
- input in UNPROM, otherwise describe OP' in UNPROM.
+ input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
+ is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
+ have more than one user.
A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change,
@@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_i
static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op,
- vect_unpromoted_value *unprom)
+ vect_unpromoted_value *unprom,
+ bool *single_use_p = NULL)
{
tree res = NULL_TREE;
tree op_type = TREE_TYPE (op);
@@ -420,7 +423,14 @@ vect_look_through_possible_promotion (ve
if (!def_stmt)
break;
if (dt == vect_internal_def)
- caster = vinfo_for_stmt (def_stmt);
+ {
+ caster = vinfo_for_stmt (def_stmt);
+ /* Ignore pattern statements, since we don't link uses for them. */
+ if (single_use_p
+ && !STMT_VINFO_RELATED_STMT (caster)
+ && !has_single_use (res))
+ *single_use_p = false;
+ }
else
caster = NULL;
gassign *assign = dyn_cast <gassign *> (def_stmt);
@@ -1371,363 +1381,318 @@ vect_recog_widen_sum_pattern (vec<gimple
return pattern_stmt;
}
+/* Recognize cases in which an operation is performed in one type WTYPE
+ but could be done more efficiently in a narrower type NTYPE. For example,
+ if we have:
+
+ ATYPE a; // narrower than NTYPE
+ BTYPE b; // narrower than NTYPE
+ WTYPE aw = (WTYPE) a;
+ WTYPE bw = (WTYPE) b;
+ WTYPE res = aw + bw; // only uses of aw and bw
+
+ then it would be more efficient to do:
+
+ NTYPE an = (NTYPE) a;
+ NTYPE bn = (NTYPE) b;
+ NTYPE resn = an + bn;
+ WTYPE res = (WTYPE) resn;
+
+ Other situations include things like:
+
+ ATYPE a; // NTYPE or narrower
+ WTYPE aw = (WTYPE) a;
+ WTYPE res = aw + b;
+
+ when only "(NTYPE) res" is significant. In that case it's more efficient
+ to truncate "b" and do the operation on NTYPE instead:
+
+ NTYPE an = (NTYPE) a;
+ NTYPE bn = (NTYPE) b; // truncation
+ NTYPE resn = an + bn;
+ WTYPE res = (WTYPE) resn;
+
+ All users of "res" should then use "resn" instead, making the final
+ statement dead (not marked as relevant). The final statement is still
+ needed to maintain the type correctness of the IR.
+
+ vect_determine_precisions has already determined the minimum
+ precison of the operation and the minimum precision required
+ by users of the result. */
-/* Return TRUE if the operation in STMT can be performed on a smaller type.
+static gimple *
+vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
+{
+ gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
+ if (!last_stmt)
+ return NULL;
- Input:
- STMT - a statement to check.
- DEF - we support operations with two operands, one of which is constant.
- The other operand can be defined by a demotion operation, or by a
- previous statement in a sequence of over-promoted operations. In the
- later case DEF is used to replace that operand. (It is defined by a
- pattern statement we created for the previous statement in the
- sequence).
-
- Input/output:
- NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
- NULL, it's the type of DEF.
- STMTS - additional pattern statements. If a pattern statement (type
- conversion) is created in this function, its original statement is
- added to STMTS.
+ /* See whether we have found that this operation can be done on a
+ narrower type without changing its semantics. */
+ stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
+ unsigned int new_precision = last_stmt_info->operation_precision;
+ if (!new_precision)
+ return NULL;
- Output:
- OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
- operands to use in the new pattern statement for STMT (will be created
- in vect_recog_over_widening_pattern ()).
- NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
- statements for STMT: the first one is a type promotion and the second
- one is the operation itself. We return the type promotion statement
- in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
- the second pattern statement. */
+ vec_info *vinfo = last_stmt_info->vinfo;
+ tree lhs = gimple_assign_lhs (last_stmt);
+ tree type = TREE_TYPE (lhs);
+ tree_code code = gimple_assign_rhs_code (last_stmt);
+
+ /* Keep the first operand of a COND_EXPR as-is: only the other two
+ operands are interesting. */
+ unsigned int first_op = (code == COND_EXPR ? 2 : 1);
-static bool
-vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
- tree *op0, tree *op1, gimple **new_def_stmt,
- vec<gimple *> *stmts)
-{
- enum tree_code code;
- tree const_oprnd, oprnd;
- tree interm_type = NULL_TREE, half_type, new_oprnd, type;
- gimple *def_stmt, *new_stmt;
- bool first = false;
- bool promotion;
+ /* Check the operands. */
+ unsigned int nops = gimple_num_ops (last_stmt) - first_op;
+ auto_vec <vect_unpromoted_value, 3> unprom (nops);
+ unprom.quick_grow (nops);
+ unsigned int min_precision = 0;
+ bool single_use_p = false;
+ for (unsigned int i = 0; i < nops; ++i)
+ {
+ tree op = gimple_op (last_stmt, first_op + i);
+ if (TREE_CODE (op) == INTEGER_CST)
+ unprom[i].set_op (op, vect_constant_def);
+ else if (TREE_CODE (op) == SSA_NAME)
+ {
+ bool op_single_use_p = true;
+ if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
+ &op_single_use_p))
+ return NULL;
+ /* If:
- *op0 = NULL_TREE;
- *op1 = NULL_TREE;
- *new_def_stmt = NULL;
+ (1) N bits of the result are needed;
+ (2) all inputs are widened from M<N bits; and
+ (3) one operand OP is a single-use SSA name
+
+ we can shift the M->N widening from OP to the output
+ without changing the number or type of extensions involved.
+ This then reduces the number of copies of STMT_INFO.
+
+ If instead of (3) more than one operand is a single-use SSA name,
+ shifting the extension to the output is even more of a win.
+
+ If instead:
+
+ (1) N bits of the result are needed;
+ (2) one operand OP2 is widened from M2<N bits;
+ (3) another operand OP1 is widened from M1<M2 bits; and
+ (4) both OP1 and OP2 are single-use
+
+ the choice is between:
+
+ (a) truncating OP2 to M1, doing the operation on M1,
+ and then widening the result to N
+
+ (b) widening OP1 to M2, doing the operation on M2, and then
+ widening the result to N
+
+ Both shift the M2->N widening of the inputs to the output.
+ (a) additionally shifts the M1->M2 widening to the output;
+ it requires fewer copies of STMT_INFO but requires an extra
+ M2->M1 truncation.
+
+ Which is better will depend on the complexity and cost of
+ STMT_INFO, which is hard to predict at this stage. However,
+ a clear tie-breaker in favor of (b) is the fact that the
+ truncation in (a) increases the length of the operation chain.
+
+ If instead of (4) only one of OP1 or OP2 is single-use,
+ (b) is still a win over doing the operation in N bits:
+ it still shifts the M2->N widening on the single-use operand
+ to the output and reduces the number of STMT_INFO copies.
+
+ If neither operand is single-use then operating on fewer than
+ N bits might lead to more extensions overall. Whether it does
+ or not depends on global information about the vectorization
+ region, and whether that's a good trade-off would again
+ depend on the complexity and cost of the statements involved,
+ as well as things like register pressure that are not normally
+ modelled at this stage. We therefore ignore these cases
+ and just optimize the clear single-use wins above.
+
+ Thus we take the maximum precision of the unpromoted operands
+ and record whether any operand is single-use. */
+ if (unprom[i].dt == vect_internal_def)
+ {
+ min_precision = MAX (min_precision,
+ TYPE_PRECISION (unprom[i].type));
+ single_use_p |= op_single_use_p;
+ }
+ }
+ }
- if (!is_gimple_assign (stmt))
- return false;
+ /* Although the operation could be done in operation_precision, we have
+ to balance that against introducing extra truncations or extensions.
+ Calculate the minimum precision that can be handled efficiently.
+
+ The loop above determined that the operation could be handled
+ efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
+ extension from the inputs to the output without introducing more
+ instructions, and would reduce the number of instructions required
+ for STMT_INFO itself.
+
+ vect_determine_precisions has also determined that the result only
+ needs min_output_precision bits. Truncating by a factor of N times
+ requires a tree of N - 1 instructions, so if TYPE is N times wider
+ than min_output_precision, doing the operation in TYPE and truncating
+ the result requires N + (N - 1) = 2N - 1 instructions per output vector.
+ In contrast:
+
+ - truncating the input to a unary operation and doing the operation
+ in the new type requires at most N - 1 + 1 = N instructions per
+ output vector
+
+ - doing the same for a binary operation requires at most
+ (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
+
+ Both unary and binary operations require fewer instructions than
+ this if the operands were extended from a suitable truncated form.
+ Thus there is usually nothing to lose by doing operations in
+ min_output_precision bits, but there can be something to gain. */
+ if (!single_use_p)
+ min_precision = last_stmt_info->min_output_precision;
+ else
+ min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
- code = gimple_assign_rhs_code (stmt);
- if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
- && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
- return false;
+ /* Apply the minimum efficient precision we just calculated. */
+ if (new_precision < min_precision)
+ new_precision = min_precision;
+ if (new_precision >= TYPE_PRECISION (type))
+ return NULL;
- oprnd = gimple_assign_rhs1 (stmt);
- const_oprnd = gimple_assign_rhs2 (stmt);
- type = gimple_expr_type (stmt);
+ vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
- if (TREE_CODE (oprnd) != SSA_NAME
- || TREE_CODE (const_oprnd) != INTEGER_CST)
- return false;
+ *type_out = get_vectype_for_scalar_type (type);
+ if (!*type_out)
+ return NULL;
- /* If oprnd has other uses besides that in stmt we cannot mark it
- as being part of a pattern only. */
- if (!has_single_use (oprnd))
- return false;
+ /* We've found a viable pattern. Get the new type of the operation. */
+ bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
+ tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
+
+ /* We specifically don't check here whether the target supports the
+ new operation, since it might be something that a later pattern
+ wants to rewrite anyway. If targets have a minimum element size
+ for some optabs, we should pattern-match smaller ops to larger ops
+ where beneficial. */
+ tree new_vectype = get_vectype_for_scalar_type (new_type);
+ if (!new_vectype)
+ return NULL;
- /* If we are in the middle of a sequence, we use DEF from a previous
- statement. Otherwise, OPRND has to be a result of type promotion. */
- if (*new_type)
- {
- half_type = *new_type;
- oprnd = def;
- }
- else
+ if (dump_enabled_p ())
{
- first = true;
- if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
- &promotion)
- || !promotion
- || !vect_same_loop_or_bb_p (stmt, def_stmt))
- return false;
+ dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
+ dump_printf (MSG_NOTE, " to ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
+ dump_printf (MSG_NOTE, "\n");
}
- /* Can we perform the operation on a smaller type? */
- switch (code)
- {
- case BIT_IOR_EXPR:
- case BIT_XOR_EXPR:
- case BIT_AND_EXPR:
- if (!int_fits_type_p (const_oprnd, half_type))
- {
- /* HALF_TYPE is not enough. Try a bigger type if possible. */
- if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
- return false;
-
- interm_type = build_nonstandard_integer_type (
- TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
- if (!int_fits_type_p (const_oprnd, interm_type))
- return false;
- }
-
- break;
-
- case LSHIFT_EXPR:
- /* Try intermediate type - HALF_TYPE is not enough for sure. */
- if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
- return false;
-
- /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
- (e.g., if the original value was char, the shift amount is at most 8
- if we want to use short). */
- if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
- return false;
-
- interm_type = build_nonstandard_integer_type (
- TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
-
- if (!vect_supportable_shift (code, interm_type))
- return false;
-
- break;
-
- case RSHIFT_EXPR:
- if (vect_supportable_shift (code, half_type))
- break;
-
- /* Try intermediate type - HALF_TYPE is not supported. */
- if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
- return false;
-
- interm_type = build_nonstandard_integer_type (
- TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
-
- if (!vect_supportable_shift (code, interm_type))
- return false;
-
- break;
-
- default:
- gcc_unreachable ();
- }
-
- /* There are four possible cases:
- 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
- the first statement in the sequence)
- a. The original, HALF_TYPE, is not enough - we replace the promotion
- from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
- b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
- promotion.
- 2. OPRND is defined by a pattern statement we created.
- a. Its type is not sufficient for the operation, we create a new stmt:
- a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
- this statement in NEW_DEF_STMT, and it is later put in
- STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
- b. OPRND is good to use in the new statement. */
- if (first)
- {
- if (interm_type)
- {
- /* Replace the original type conversion HALF_TYPE->TYPE with
- HALF_TYPE->INTERM_TYPE. */
- if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
- {
- new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
- /* Check if the already created pattern stmt is what we need. */
- if (!is_gimple_assign (new_stmt)
- || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
- || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
- return false;
-
- stmts->safe_push (def_stmt);
- oprnd = gimple_assign_lhs (new_stmt);
- }
- else
- {
- /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
- oprnd = gimple_assign_rhs1 (def_stmt);
- new_oprnd = make_ssa_name (interm_type);
- new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
- STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
- stmts->safe_push (def_stmt);
- oprnd = new_oprnd;
- }
- }
- else
- {
- /* Retrieve the operand before the type promotion. */
- oprnd = gimple_assign_rhs1 (def_stmt);
- }
- }
- else
- {
- if (interm_type)
- {
- /* Create a type conversion HALF_TYPE->INTERM_TYPE. */
- new_oprnd = make_ssa_name (interm_type);
- new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
- oprnd = new_oprnd;
- *new_def_stmt = new_stmt;
- }
+ /* Calculate the rhs operands for an operation on NEW_TYPE. */
+ STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
+ tree ops[3] = {};
+ for (unsigned int i = 1; i < first_op; ++i)
+ ops[i - 1] = gimple_op (last_stmt, i);
+ vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
+ new_type, &unprom[0], new_vectype);
+
+ /* Use the operation to produce a result of type NEW_TYPE. */
+ tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
+ gimple *pattern_stmt = gimple_build_assign (new_var, code,
+ ops[0], ops[1], ops[2]);
+ gimple_set_location (pattern_stmt, gimple_location (last_stmt));
- /* Otherwise, OPRND is already set. */
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "created pattern stmt: ");
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
}
- if (interm_type)
- *new_type = interm_type;
- else
- *new_type = half_type;
+ pattern_stmt = vect_convert_output (last_stmt_info, type,
+ pattern_stmt, new_vectype);
- *op0 = oprnd;
- *op1 = fold_convert (*new_type, const_oprnd);
-
- return true;
+ stmts->safe_push (last_stmt);
+ return pattern_stmt;
}
+/* Recognize cases in which the input to a cast is wider than its
+ output, and the input is fed by a widening operation. Fold this
+ by removing the unnecessary intermediate widening. E.g.:
-/* Try to find a statement or a sequence of statements that can be performed
- on a smaller type:
+ unsigned char a;
+ unsigned int b = (unsigned int) a;
+ unsigned short c = (unsigned short) b;
- type x_t;
- TYPE x_T, res0_T, res1_T;
- loop:
- S1 x_t = *p;
- S2 x_T = (TYPE) x_t;
- S3 res0_T = op (x_T, C0);
- S4 res1_T = op (res0_T, C1);
- S5 ... = () res1_T; - type demotion
-
- where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
- constants.
- Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
- be 'type' or some intermediate type. For now, we expect S5 to be a type
- demotion operation. We also check that S3 and S4 have only one use. */
+ -->
-static gimple *
-vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
-{
- gimple *stmt = stmts->pop ();
- gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
- *use_stmt = NULL;
- tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
- tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
- bool first;
- tree type = NULL;
-
- first = true;
- while (1)
- {
- if (!vinfo_for_stmt (stmt)
- || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
- return NULL;
-
- new_def_stmt = NULL;
- if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
- &op0, &op1, &new_def_stmt,
- stmts))
- {
- if (first)
- return NULL;
- else
- break;
- }
+ unsigned short c = (unsigned short) a;
- /* STMT can be performed on a smaller type. Check its uses. */
- use_stmt = vect_single_imm_use (stmt);
- if (!use_stmt || !is_gimple_assign (use_stmt))
- return NULL;
-
- /* Create pattern statement for STMT. */
- vectype = get_vectype_for_scalar_type (new_type);
- if (!vectype)
- return NULL;
-
- /* We want to collect all the statements for which we create pattern
- statetments, except for the case when the last statement in the
- sequence doesn't have a corresponding pattern statement. In such
- case we associate the last pattern statement with the last statement
- in the sequence. Therefore, we only add the original statement to
- the list if we know that it is not the last. */
- if (prev_stmt)
- stmts->safe_push (prev_stmt);
+ Although this is rare in input IR, it is an expected side-effect
+ of the over-widening pattern above.
- var = vect_recog_temp_ssa_var (new_type, NULL);
- pattern_stmt
- = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
- STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
- new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
+ This is beneficial also for integer-to-float conversions, if the
+ widened integer has more bits than the float, and if the unwidened
+ input doesn't. */
- if (dump_enabled_p ())
- {
- dump_printf_loc (MSG_NOTE, vect_location,
- "created pattern stmt: ");
- dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
- }
+static gimple *
+vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
+{
+ /* Check for a cast, including an integer-to-float conversion. */
+ gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
+ if (!last_stmt)
+ return NULL;
+ tree_code code = gimple_assign_rhs_code (last_stmt);
+ if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
+ return NULL;
- type = gimple_expr_type (stmt);
- prev_stmt = stmt;
- stmt = use_stmt;
-
- first = false;
- }
-
- /* We got a sequence. We expect it to end with a type demotion operation.
- Otherwise, we quit (for now). There are three possible cases: the
- conversion is to NEW_TYPE (we don't do anything), the conversion is to
- a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
- NEW_TYPE differs (we create a new conversion statement). */
- if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
- {
- use_lhs = gimple_assign_lhs (use_stmt);
- use_type = TREE_TYPE (use_lhs);
- /* Support only type demotion or signedess change. */
- if (!INTEGRAL_TYPE_P (use_type)
- || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
- return NULL;
+ /* Make sure that the rhs is a scalar with a natural bitsize. */
+ tree lhs = gimple_assign_lhs (last_stmt);
+ if (!lhs)
+ return NULL;
+ tree lhs_type = TREE_TYPE (lhs);
+ scalar_mode lhs_mode;
+ if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
+ || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
+ return NULL;
- /* Check that NEW_TYPE is not bigger than the conversion result. */
- if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
- return NULL;
+ /* Check for a narrowing operation (from a vector point of view). */
+ tree rhs = gimple_assign_rhs1 (last_stmt);
+ tree rhs_type = TREE_TYPE (rhs);
+ if (!INTEGRAL_TYPE_P (rhs_type)
+ || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
+ || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
+ return NULL;
- if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
- || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
- {
- *type_out = get_vectype_for_scalar_type (use_type);
- if (!*type_out)
- return NULL;
+ /* Try to find an unpromoted input. */
+ stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
+ vec_info *vinfo = last_stmt_info->vinfo;
+ vect_unpromoted_value unprom;
+ if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
+ || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
+ return NULL;
- /* Create NEW_TYPE->USE_TYPE conversion. */
- new_oprnd = make_ssa_name (use_type);
- pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
- STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
-
- /* We created a pattern statement for the last statement in the
- sequence, so we don't need to associate it with the pattern
- statement created for PREV_STMT. Therefore, we add PREV_STMT
- to the list in order to mark it later in vect_pattern_recog_1. */
- if (prev_stmt)
- stmts->safe_push (prev_stmt);
- }
- else
- {
- if (prev_stmt)
- STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
- = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
+ /* If the bits above RHS_TYPE matter, make sure that they're the
+ same when extending from UNPROM as they are when extending from RHS. */
+ if (!INTEGRAL_TYPE_P (lhs_type)
+ && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
+ return NULL;
- *type_out = vectype;
- }
+ /* We can get the same result by casting UNPROM directly, to avoid
+ the unnecessary widening and narrowing. */
+ vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
- stmts->safe_push (use_stmt);
- }
- else
- /* TODO: support general case, create a conversion to the correct type. */
+ *type_out = get_vectype_for_scalar_type (lhs_type);
+ if (!*type_out)
return NULL;
- /* Pattern detected. */
- vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
+ tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
+ gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op);
+ gimple_set_location (pattern_stmt, gimple_location (last_stmt));
+ stmts->safe_push (last_stmt);
return pattern_stmt;
}
@@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<g
return pattern_stmt;
}
+/* Return true if TYPE is a non-boolean integer type. These are the types
+ that we want to consider for narrowing. */
+
+static bool
+vect_narrowable_type_p (tree type)
+{
+ return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
+}
+
+/* Return true if the operation given by CODE can be truncated to N bits
+ when only N bits of the output are needed. This is only true if bit N+1
+ of the inputs has no effect on the low N bits of the result. */
+
+static bool
+vect_truncatable_operation_p (tree_code code)
+{
+ switch (code)
+ {
+ case PLUS_EXPR:
+ case MINUS_EXPR:
+ case MULT_EXPR:
+ case BIT_AND_EXPR:
+ case BIT_IOR_EXPR:
+ case BIT_XOR_EXPR:
+ case COND_EXPR:
+ return true;
+
+ default:
+ return false;
+ }
+}
+
+/* Record that STMT_INFO could be changed from operating on TYPE to
+ operating on a type with the precision and sign given by PRECISION
+ and SIGN respectively. PRECISION is an arbitrary bit precision;
+ it might not be a whole number of bytes. */
+
+static void
+vect_set_operation_type (stmt_vec_info stmt_info, tree type,
+ unsigned int precision, signop sign)
+{
+ /* Round the precision up to a whole number of bytes. */
+ precision = vect_element_precision (precision);
+ if (precision < TYPE_PRECISION (type)
+ && (!stmt_info->operation_precision
+ || stmt_info->operation_precision > precision))
+ {
+ stmt_info->operation_precision = precision;
+ stmt_info->operation_sign = sign;
+ }
+}
+
+/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
+ non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
+ is an arbitrary bit precision; it might not be a whole number of bytes. */
+
+static void
+vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
+ unsigned int min_input_precision)
+{
+ /* This operation in isolation only requires the inputs to have
+ MIN_INPUT_PRECISION of precision, However, that doesn't mean
+ that MIN_INPUT_PRECISION is a natural precision for the chain
+ as a whole. E.g. consider something like:
+
+ unsigned short *x, *y;
+ *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+ The right shift can be done on unsigned chars, and only requires the
+ result of "*x & 0xf0" to be done on unsigned chars. But taking that
+ approach would mean turning a natural chain of single-vector unsigned
+ short operations into one that truncates "*x" and then extends
+ "(*x & 0xf0) >> 4", with two vectors for each unsigned short
+ operation and one vector for each unsigned char operation.
+ This would be a significant pessimization.
+
+ Instead only propagate the maximum of this precision and the precision
+ required by the users of the result. This means that we don't pessimize
+ the case above but continue to optimize things like:
+
+ unsigned char *y;
+ unsigned short *x;
+ *y = ((*x & 0xf0) >> 4) | (*y << 4);
+
+ Here we would truncate two vectors of *x to a single vector of
+ unsigned chars and use single-vector unsigned char operations for
+ everything else, rather than doing two unsigned short copies of
+ "(*x & 0xf0) >> 4" and then truncating the result. */
+ min_input_precision = MAX (min_input_precision,
+ stmt_info->min_output_precision);
+
+ if (min_input_precision < TYPE_PRECISION (type)
+ && (!stmt_info->min_input_precision
+ || stmt_info->min_input_precision > min_input_precision))
+ stmt_info->min_input_precision = min_input_precision;
+}
+
+/* Subroutine of vect_determine_min_output_precision. Return true if
+ we can calculate a reduced number of output bits for STMT_INFO,
+ whose result is LHS. */
+
+static bool
+vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
+{
+ /* Take the maximum precision required by users of the result. */
+ unsigned int precision = 0;
+ imm_use_iterator iter;
+ use_operand_p use;
+ FOR_EACH_IMM_USE_FAST (use, iter, lhs)
+ {
+ gimple *use_stmt = USE_STMT (use);
+ if (is_gimple_debug (use_stmt))
+ continue;
+ if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
+ return false;
+ stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
+ if (!use_stmt_info->min_input_precision)
+ return false;
+ precision = MAX (precision, use_stmt_info->min_input_precision);
+ }
+
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
+ precision);
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
+ dump_printf (MSG_NOTE, " are significant\n");
+ }
+ stmt_info->min_output_precision = precision;
+ return true;
+}
+
+/* Calculate min_output_precision for STMT_INFO. */
+
+static void
+vect_determine_min_output_precision (stmt_vec_info stmt_info)
+{
+ /* We're only interested in statements with a narrowable result. */
+ tree lhs = gimple_get_lhs (stmt_info->stmt);
+ if (!lhs
+ || TREE_CODE (lhs) != SSA_NAME
+ || !vect_narrowable_type_p (TREE_TYPE (lhs)))
+ return;
+
+ if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
+ stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
+}
+
+/* Use range information to decide whether STMT (described by STMT_INFO)
+ could be done in a narrower type. This is effectively a forward
+ propagation, since it uses context-independent information that applies
+ to all users of an SSA name. */
+
+static void
+vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
+{
+ tree lhs = gimple_assign_lhs (stmt);
+ if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+ return;
+
+ tree type = TREE_TYPE (lhs);
+ if (!vect_narrowable_type_p (type))
+ return;
+
+ /* First see whether we have any useful range information for the result. */
+ unsigned int precision = TYPE_PRECISION (type);
+ signop sign = TYPE_SIGN (type);
+ wide_int min_value, max_value;
+ if (!vect_get_range_info (lhs, &min_value, &max_value))
+ return;
+
+ tree_code code = gimple_assign_rhs_code (stmt);
+ unsigned int nops = gimple_num_ops (stmt);
+
+ if (!vect_truncatable_operation_p (code))
+ /* Check that all relevant input operands are compatible, and update
+ [MIN_VALUE, MAX_VALUE] to include their ranges. */
+ for (unsigned int i = 1; i < nops; ++i)
+ {
+ tree op = gimple_op (stmt, i);
+ if (TREE_CODE (op) == INTEGER_CST)
+ {
+ /* Don't require the integer to have RHS_TYPE (which it might
+ not for things like shift amounts, etc.), but do require it
+ to fit the type. */
+ if (!int_fits_type_p (op, type))
+ return;
+
+ min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
+ max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
+ }
+ else if (TREE_CODE (op) == SSA_NAME)
+ {
+ /* Ignore codes that don't take uniform arguments. */
+ if (!types_compatible_p (TREE_TYPE (op), type))
+ return;
+
+ wide_int op_min_value, op_max_value;
+ if (!vect_get_range_info (op, &op_min_value, &op_max_value))
+ return;
+
+ min_value = wi::min (min_value, op_min_value, sign);
+ max_value = wi::max (max_value, op_max_value, sign);
+ }
+ else
+ return;
+ }
+
+ /* Try to switch signed types for unsigned types if we can.
+ This is better for two reasons. First, unsigned ops tend
+ to be cheaper than signed ops. Second, it means that we can
+ handle things like:
+
+ signed char c;
+ int res = (int) c & 0xff00; // range [0x0000, 0xff00]
+
+ as:
+
+ signed char c;
+ unsigned short res_1 = (unsigned short) c & 0xff00;
+ int res = (int) res_1;
+
+ where the intermediate result res_1 has unsigned rather than
+ signed type. */
+ if (sign == SIGNED && !wi::neg_p (min_value))
+ sign = UNSIGNED;
+
+ /* See what precision is required for MIN_VALUE and MAX_VALUE. */
+ unsigned int precision1 = wi::min_precision (min_value, sign);
+ unsigned int precision2 = wi::min_precision (max_value, sign);
+ unsigned int value_precision = MAX (precision1, precision2);
+ if (value_precision >= precision)
+ return;
+
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ " without loss of precision: ",
+ sign == SIGNED ? "signed" : "unsigned",
+ value_precision);
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ }
+
+ vect_set_operation_type (stmt_info, type, value_precision, sign);
+ vect_set_min_input_precision (stmt_info, type, value_precision);
+}
+
+/* Use information about the users of STMT's result to decide whether
+ STMT (described by STMT_INFO) could be done in a narrower type.
+ This is effectively a backward propagation. */
+
+static void
+vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
+{
+ tree_code code = gimple_assign_rhs_code (stmt);
+ unsigned int opno = (code == COND_EXPR ? 2 : 1);
+ tree type = TREE_TYPE (gimple_op (stmt, opno));
+ if (!vect_narrowable_type_p (type))
+ return;
+
+ unsigned int precision = TYPE_PRECISION (type);
+ unsigned int operation_precision, min_input_precision;
+ switch (code)
+ {
+ CASE_CONVERT:
+ /* Only the bits that contribute to the output matter. Don't change
+ the precision of the operation itself. */
+ operation_precision = precision;
+ min_input_precision = stmt_info->min_output_precision;
+ break;
+
+ case LSHIFT_EXPR:
+ case RSHIFT_EXPR:
+ {
+ tree shift = gimple_assign_rhs2 (stmt);
+ if (TREE_CODE (shift) != INTEGER_CST
+ || !wi::ltu_p (wi::to_widest (shift), precision))
+ return;
+ unsigned int const_shift = TREE_INT_CST_LOW (shift);
+ if (code == LSHIFT_EXPR)
+ {
+ /* We need CONST_SHIFT fewer bits of the input. */
+ operation_precision = stmt_info->min_output_precision;
+ min_input_precision = (MAX (operation_precision, const_shift)
+ - const_shift);
+ }
+ else
+ {
+ /* We need CONST_SHIFT extra bits to do the operation. */
+ operation_precision = (stmt_info->min_output_precision
+ + const_shift);
+ min_input_precision = operation_precision;
+ }
+ break;
+ }
+
+ default:
+ if (vect_truncatable_operation_p (code))
+ {
+ /* Input bit N has no effect on output bits N-1 and lower. */
+ operation_precision = stmt_info->min_output_precision;
+ min_input_precision = operation_precision;
+ break;
+ }
+ return;
+ }
+
+ if (operation_precision < precision)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
+ " without affecting users: ",
+ TYPE_UNSIGNED (type) ? "unsigned" : "signed",
+ operation_precision);
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ }
+ vect_set_operation_type (stmt_info, type, operation_precision,
+ TYPE_SIGN (type));
+ }
+ vect_set_min_input_precision (stmt_info, type, min_input_precision);
+}
+
+/* Handle vect_determine_precisions for STMT_INFO, given that we
+ have already done so for the users of its result. */
+
+void
+vect_determine_stmt_precisions (stmt_vec_info stmt_info)
+{
+ vect_determine_min_output_precision (stmt_info);
+ if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
+ {
+ vect_determine_precisions_from_range (stmt_info, stmt);
+ vect_determine_precisions_from_users (stmt_info, stmt);
+ }
+}
+
+/* Walk backwards through the vectorizable region to determine the
+ values of these fields:
+
+ - min_output_precision
+ - min_input_precision
+ - operation_precision
+ - operation_sign. */
+
+void
+vect_determine_precisions (vec_info *vinfo)
+{
+ DUMP_VECT_SCOPE ("vect_determine_precisions");
+
+ if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+ unsigned int nbbs = loop->num_nodes;
+
+ for (unsigned int i = 0; i < nbbs; i++)
+ {
+ basic_block bb = bbs[nbbs - i - 1];
+ for (gimple_stmt_iterator si = gsi_last_bb (bb);
+ !gsi_end_p (si); gsi_prev (&si))
+ vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
+ }
+ }
+ else
+ {
+ bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+ gimple_stmt_iterator si = bb_vinfo->region_end;
+ gimple *stmt;
+ do
+ {
+ if (!gsi_stmt (si))
+ si = gsi_last_bb (bb_vinfo->bb);
+ else
+ gsi_prev (&si);
+ stmt = gsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+ vect_determine_stmt_precisions (stmt_info);
+ }
+ while (stmt != gsi_stmt (bb_vinfo->region_begin));
+ }
+}
+
typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
struct vect_recog_func
@@ -4217,13 +4566,14 @@ struct vect_recog_func
taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = {
+ { vect_recog_over_widening_pattern, "over_widening" },
+ { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
- { vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" },
@@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo)
unsigned int i, j;
auto_vec<gimple *, 1> stmts_to_replace;
+ vect_determine_precisions (vinfo);
+
DUMP_VECT_SCOPE ("vect_pattern_recog");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
Index: gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c 2016-11-11 17:07:36.776796115 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c 2018-07-03 09:02:36.567413531 +0100
@@ -43,5 +43,5 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
-/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
+/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-07-03 09:02:36.563413564 +0100
@@ -62,8 +62,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-07-03 09:02:36.563413564 +0100
@@ -58,7 +58,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-07-03 09:02:36.563413564 +0100
@@ -57,7 +57,12 @@ int main (void)
return 0;
}
-/* Final value stays in int, so no over-widening is detected at the moment. */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
+/* This is an over-widening even though the final result is still an int.
+ It's better to do one vector of ops on chars and then widen than to
+ widen and then do 4 vectors of ops on ints. */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-07-03 09:02:36.563413564 +0100
@@ -57,7 +57,12 @@ int main (void)
return 0;
}
-/* Final value stays in int, so no over-widening is detected at the moment. */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
+/* This is an over-widening even though the final result is still an int.
+ It's better to do one vector of ops on chars and then widen than to
+ widen and then do 4 vectors of ops on ints. */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-07-03 09:02:36.563413564 +0100
@@ -57,6 +57,9 @@ int main (void)
return 0;
}
-/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-07-03 09:02:36.563413564 +0100
@@ -59,7 +59,9 @@ int main (void)
return 0;
}
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-07-03 09:02:36.563413564 +0100
@@ -66,8 +66,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-07-03 09:01:31.075962445 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-07-03 09:02:36.563413564 +0100
@@ -62,7 +62,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,66 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+/* Deliberate use of signed >>. */
+#define DEF_LOOP(SIGNEDNESS) \
+ void __attribute__ ((noipa)) \
+ f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
+ SIGNEDNESS char *restrict b, \
+ SIGNEDNESS char *restrict c) \
+ { \
+ a[0] = (b[0] + c[0]) >> 1; \
+ a[1] = (b[1] + c[1]) >> 1; \
+ a[2] = (b[2] + c[2]) >> 1; \
+ a[3] = (b[3] + c[3]) >> 1; \
+ a[4] = (b[4] + c[4]) >> 1; \
+ a[5] = (b[5] + c[5]) >> 1; \
+ a[6] = (b[6] + c[6]) >> 1; \
+ a[7] = (b[7] + c[7]) >> 1; \
+ a[8] = (b[8] + c[8]) >> 1; \
+ a[9] = (b[9] + c[9]) >> 1; \
+ a[10] = (b[10] + c[10]) >> 1; \
+ a[11] = (b[11] + c[11]) >> 1; \
+ a[12] = (b[12] + c[12]) >> 1; \
+ a[13] = (b[13] + c[13]) >> 1; \
+ a[14] = (b[14] + c[14]) >> 1; \
+ a[15] = (b[15] + c[15]) >> 1; \
+ }
+
+DEF_LOOP (signed)
+DEF_LOOP (unsigned)
+
+#define N 16
+
+#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \
+ { \
+ SIGNEDNESS char a[N], b[N], c[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ b[i] = BASE_B + i * 15; \
+ c[i] = BASE_C + i * 14; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ f_##SIGNEDNESS (a, b, c); \
+ for (int i = 0; i < N; ++i) \
+ if (a[i] != (BASE_B + BASE_C + i * 29) >> 1) \
+ __builtin_abort (); \
+ }
+
+int
+main (void)
+{
+ check_vect ();
+
+ TEST_LOOP (signed, -128, -120);
+ TEST_LOOP (unsigned, 4, 10);
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+/* Deliberate use of signed >>. */
+#define DEF_LOOP(SIGNEDNESS) \
+ void __attribute__ ((noipa)) \
+ f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
+ SIGNEDNESS char *restrict b, \
+ SIGNEDNESS char c) \
+ { \
+ a[0] = (b[0] + c) >> 1; \
+ a[1] = (b[1] + c) >> 1; \
+ a[2] = (b[2] + c) >> 1; \
+ a[3] = (b[3] + c) >> 1; \
+ a[4] = (b[4] + c) >> 1; \
+ a[5] = (b[5] + c) >> 1; \
+ a[6] = (b[6] + c) >> 1; \
+ a[7] = (b[7] + c) >> 1; \
+ a[8] = (b[8] + c) >> 1; \
+ a[9] = (b[9] + c) >> 1; \
+ a[10] = (b[10] + c) >> 1; \
+ a[11] = (b[11] + c) >> 1; \
+ a[12] = (b[12] + c) >> 1; \
+ a[13] = (b[13] + c) >> 1; \
+ a[14] = (b[14] + c) >> 1; \
+ a[15] = (b[15] + c) >> 1; \
+ }
+
+DEF_LOOP (signed)
+DEF_LOOP (unsigned)
+
+#define N 16
+
+#define TEST_LOOP(SIGNEDNESS, BASE_B, C) \
+ { \
+ SIGNEDNESS char a[N], b[N], c[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ b[i] = BASE_B + i * 15; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ f_##SIGNEDNESS (a, b, C); \
+ for (int i = 0; i < N; ++i) \
+ if (a[i] != (BASE_B + C + i * 15) >> 1) \
+ __builtin_abort (); \
+ }
+
+int
+main (void)
+{
+ check_vect ();
+
+ TEST_LOOP (signed, -128, -120);
+ TEST_LOOP (unsigned, 4, 250);
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+{
+ /* Deliberate use of signed >>. */
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) >> 1;
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+
+#include "vect-over-widen-5.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#define D -120
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c, SIGNEDNESS char d)
+{
+ int promoted_d = d;
+ for (int i = 0; i < N; ++i)
+ /* Deliberate use of signed >>. */
+ a[i] = (b[i] + c[i] + promoted_d) >> 2;
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, D);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c 2018-07-03 09:02:36.567413531 +0100
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#define D 251
+#endif
+
+#include "vect-over-widen-7.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c 2018-07-03 09:02:36.567413531 +0100
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short. */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+{
+ for (int i = 0; i < N; ++i)
+ {
+ /* Deliberate use of signed >>. */
+ int res = b[i] + c[i];
+ a[i] = (res + (res >> 1)) >> 2;
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ {
+ int res = BASE_B + BASE_C + i * 9;
+ if (a[i] != ((res + (res >> 1)) >> 2))
+ __builtin_abort ();
+ }
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-9.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -100
+#endif
+
+#define N 50
+
+/* Both range analysis and backward propagation from the truncation show
+ that these calculations can be done in SIGNEDNESS short, with "res"
+ being extended for the store to d[i]. */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c, int *restrict d)
+{
+ for (int i = 0; i < N; ++i)
+ {
+ /* Deliberate use of signed >>. */
+ int res = b[i] + c[i];
+ a[i] = (res + (res >> 1)) >> 2;
+ d[i] = res;
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ int d[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d);
+ for (int i = 0; i < N; ++i)
+ {
+ int res = BASE_B + BASE_C + i * 9;
+ if (a[i] != ((res + (res >> 1)) >> 2))
+ __builtin_abort ();
+ if (d[i] != res)
+ __builtin_abort ();
+ }
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-11.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -120
+#endif
+
+#define N 50
+
+/* We rely on range analysis to show that these calculations can be done
+ in SIGNEDNESS short. */
+void __attribute__ ((noipa))
+f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+{
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) / 2;
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ SIGNEDNESS char a[N], b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,18 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-13.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS signed
+#define BASE_B -128
+#define BASE_C -120
+#endif
+
+#define N 50
+
+/* We rely on range analysis to show that these calculations can be done
+ in SIGNEDNESS short, with the result being extended to int for the
+ store. */
+void __attribute__ ((noipa))
+f (int *restrict a, SIGNEDNESS char *restrict b,
+ SIGNEDNESS char *restrict c)
+{
+ for (int i = 0; i < N; ++i)
+ a[i] = (b[i] + c[i]) / 2;
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ int a[N];
+ SIGNEDNESS char b[N], c[N];
+ for (int i = 0; i < N; ++i)
+ {
+ b[i] = BASE_B + i * 5;
+ c[i] = BASE_C + i * 4;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,18 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#ifndef SIGNEDNESS
+#define SIGNEDNESS unsigned
+#define BASE_B 4
+#define BASE_C 40
+#endif
+
+#include "vect-over-widen-15.c"
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,46 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 1024
+
+/* This should not be treated as an over-widening pattern, even though
+ "(b[i] & 0xef) | 0x80)" could be done in unsigned chars. */
+
+void __attribute__ ((noipa))
+f (unsigned short *restrict a, unsigned short *restrict b)
+{
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+ a[i] = foo;
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ unsigned short a[N], b[N];
+ for (int i = 0; i < N; ++i)
+ {
+ a[i] = i;
+ b[i] = i * 3;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 1024
+
+/* This should be treated as an over-widening pattern: we can truncate
+ b to unsigned char after loading it and do all the computation in
+ unsigned char. */
+
+void __attribute__ ((noipa))
+f (unsigned char *restrict a, unsigned short *restrict b)
+{
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
+ a[i] = foo;
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ unsigned char a[N];
+ unsigned short b[N];
+ for (int i = 0; i < N; ++i)
+ {
+ a[i] = i;
+ b[i] = i * 3;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
+/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
+/* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 111
+
+/* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+void __attribute__ ((noipa))
+f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+{
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ unsigned int di = d[i];
+ unsigned int ei = e[i];
+ a[i] = di;
+ b[i] = ei;
+ c[i] = di + ei;
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 111
+
+/* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+void __attribute__ ((noipa))
+f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+{
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ int di = d[i];
+ int ei = e[i];
+ a[i] = di;
+ b[i] = ei;
+ c[i] = di + ei;
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
===================================================================
--- /dev/null 2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c 2018-07-03 09:02:36.563413564 +0100
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-require-effective-target vect_unpack } */
+
+#include "tree-vect.h"
+
+#define N 111
+
+/* This shouldn't be treated as an over-widening operation: it's better
+ to reuse the extensions of di and ei for di + ei than to add them
+ as shorts and introduce a third extension. */
+
+void __attribute__ ((noipa))
+f (unsigned int *restrict a, unsigned int *restrict b,
+ unsigned int *restrict c, unsigned char *restrict d,
+ unsigned char *restrict e)
+{
+ for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+ {
+ a[i] = d[i];
+ b[i] = e[i];
+ c[i] = d[i] + e[i];
+ }
+}
+
+int
+main (void)
+{
+ check_vect ();
+
+ unsigned int a[N], b[N], c[N];
+ unsigned char d[N], e[N];
+ for (int i = 0; i < N; ++i)
+ {
+ d[i] = i * 2 + 3;
+ e[i] = i + 100;
+ asm volatile ("" ::: "memory");
+ }
+ f (a, b, c, d, e);
+ for (int i = 0; i < N; ++i)
+ if (a[i] != i * 2 + 3
+ || b[i] != i + 100
+ || c[i] != i * 3 + 103)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-07-03 10:02 ` Richard Sandiford
@ 2018-07-03 20:08 ` Christophe Lyon
2018-07-03 20:39 ` Rainer Orth
2018-07-04 7:18 ` Richard Sandiford
0 siblings, 2 replies; 10+ messages in thread
From: Christophe Lyon @ 2018-07-03 20:08 UTC (permalink / raw)
To: Richard Biener, gcc Patches, Richard Sandiford
On Tue, 3 Jul 2018 at 12:02, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> > This patch is the main part of PR85694. The aim is to recognise at least:
> >> >
> >> > signed char *a, *b, *c;
> >> > ...
> >> > for (int i = 0; i < 2048; i++)
> >> > c[i] = (a[i] + b[i]) >> 1;
> >> >
> >> > as an over-widening pattern, since the addition and shift can be done
> >> > on shorts rather than ints. However, it ended up being a lot more
> >> > general than that.
> >> >
> >> > The current over-widening pattern detection is limited to a few simple
> >> > cases: logical ops with immediate second operands, and shifts by a
> >> > constant. These cases are enough for common pixel-format conversion
> >> > and can be detected in a peephole way.
> >> >
> >> > The loop above requires two generalisations of the current code: support
> >> > for addition as well as logical ops, and support for non-constant second
> >> > operands. These are harder to detect in the same peephole way, so the
> >> > patch tries to take a more global approach.
> >> >
> >> > The idea is to get information about the minimum operation width
> >> > in two ways:
> >> >
> >> > (1) by using the range information attached to the SSA_NAMEs
> >> > (effectively a forward walk, since the range info is
> >> > context-independent).
> >> >
> >> > (2) by back-propagating the number of output bits required by
> >> > users of the result.
> >> >
> >> > As explained in the comments, there's a balance to be struck between
> >> > narrowing an individual operation and fitting in with the surrounding
> >> > code. The approach is pretty conservative: if we could narrow an
> >> > operation to N bits without changing its semantics, it's OK to do that if:
> >> >
> >> > - no operations later in the chain require more than N bits; or
> >> >
> >> > - all internally-defined inputs are extended from N bits or fewer,
> >> > and at least one of them is single-use.
> >> >
> >> > See the comments for the rationale.
> >> >
> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> >> > since the code seemed more readable without.
> >> >
> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
> >>
> >> Here's a version rebased on top of current trunk. Changes from last time:
> >>
> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
> >> prototype
> >>
> >> - fix a typo in a comment
> >>
> >> - use vect_element_precision from the new version of 12/n.
> >>
> >> Tested as before. OK to install?
> >
> > OK.
>
> Thanks. For the record, here's what I installed (updated on top of
> Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).
>
> Richard
>
Hi,
It seems the new bb-slp-over-widen tests lack a -fdump option:
gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
does not exist
UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "basic block vectorized" 2
Christophe
>
> 2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * poly-int.h (print_hex): New function.
> * dumpfile.h (dump_dec, dump_hex): Declare.
> * dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
> * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
> min_input_precision, operation_precision and operation_sign.
> * tree-vect-patterns.c (vect_get_range_info): New function.
> (vect_same_loop_or_bb_p, vect_single_imm_use)
> (vect_operation_fits_smaller_type): Delete.
> (vect_look_through_possible_promotion): Add an optional
> single_use_p parameter.
> (vect_recog_over_widening_pattern): Rewrite to use new
> stmt_vec_info infomration. Handle one operation at a time.
> (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
> (vect_truncatable_operation_p, vect_set_operation_type)
> (vect_set_min_input_precision): New functions.
> (vect_determine_min_output_precision_1): Likewise.
> (vect_determine_min_output_precision): Likewise.
> (vect_determine_precisions_from_range): Likewise.
> (vect_determine_precisions_from_users): Likewise.
> (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
> (vect_vect_recog_func_ptrs): Put over_widening first.
> Add cast_forwprop.
> (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
> widen_mult pattern.
> * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
> over-widening messages.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
> * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-5.c: Likewise.
> * gcc.dg/vect/vect-over-widen-6.c: Likewise.
> * gcc.dg/vect/vect-over-widen-7.c: Likewise.
> * gcc.dg/vect/vect-over-widen-8.c: Likewise.
> * gcc.dg/vect/vect-over-widen-9.c: Likewise.
> * gcc.dg/vect/vect-over-widen-10.c: Likewise.
> * gcc.dg/vect/vect-over-widen-11.c: Likewise.
> * gcc.dg/vect/vect-over-widen-12.c: Likewise.
> * gcc.dg/vect/vect-over-widen-13.c: Likewise.
> * gcc.dg/vect/vect-over-widen-14.c: Likewise.
> * gcc.dg/vect/vect-over-widen-15.c: Likewise.
> * gcc.dg/vect/vect-over-widen-16.c: Likewise.
> * gcc.dg/vect/vect-over-widen-17.c: Likewise.
> * gcc.dg/vect/vect-over-widen-18.c: Likewise.
> * gcc.dg/vect/vect-over-widen-19.c: Likewise.
> * gcc.dg/vect/vect-over-widen-20.c: Likewise.
> * gcc.dg/vect/vect-over-widen-21.c: Likewise.
> ------------------------------------------------------------------------------
>
> Index: gcc/poly-int.h
> ===================================================================
> --- gcc/poly-int.h 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/poly-int.h 2018-07-03 09:02:36.563413564 +0100
> @@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &val
> poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
> }
>
> +/* Use print_hex to print VALUE to FILE. */
> +
> +template<unsigned int N, typename C>
> +void
> +print_hex (const poly_int_pod<N, C> &value, FILE *file)
> +{
> + if (value.is_constant ())
> + print_hex (value.coeffs[0], file);
> + else
> + {
> + fprintf (file, "[");
> + for (unsigned int i = 0; i < N; ++i)
> + {
> + print_hex (value.coeffs[i], file);
> + fputc (i == N - 1 ? ']' : ',', file);
> + }
> + }
> +}
> +
> /* Helper for calculating the distance between two points P1 and P2,
> in cases where known_le (P1, P2). T1 and T2 are the types of the
> two positions, in either order. The coefficients of P2 - P1 have
> Index: gcc/dumpfile.h
> ===================================================================
> --- gcc/dumpfile.h 2018-07-02 14:30:09.280175397 +0100
> +++ gcc/dumpfile.h 2018-07-03 09:02:36.563413564 +0100
> @@ -436,6 +436,8 @@ extern bool enable_rtl_dump_file (void);
>
> template<unsigned int N, typename C>
> void dump_dec (dump_flags_t, const poly_int<N, C> &);
> +extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
> +extern void dump_hex (dump_flags_t, const poly_wide_int &);
>
> /* In tree-dump.c */
> extern void dump_node (const_tree, dump_flags_t, FILE *);
> Index: gcc/dumpfile.c
> ===================================================================
> --- gcc/dumpfile.c 2018-07-03 09:01:31.071962478 +0100
> +++ gcc/dumpfile.c 2018-07-03 09:02:36.563413564 +0100
> @@ -597,6 +597,28 @@ template void dump_dec (dump_flags_t, co
> template void dump_dec (dump_flags_t, const poly_offset_int &);
> template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> +void
> +dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
> +{
> + if (dump_file && (dump_kind & pflags))
> + print_dec (value, dump_file, sgn);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_dec (value, alt_dump_file, sgn);
> +}
> +
> +/* Output VALUE in hexadecimal to appropriate dump streams. */
> +
> +void
> +dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
> +{
> + if (dump_file && (dump_kind & pflags))
> + print_hex (value, dump_file);
> +
> + if (alt_dump_file && (dump_kind & alt_flags))
> + print_hex (value, alt_dump_file);
> +}
> +
> /* The current dump scope-nesting depth. */
>
> static int dump_scope_depth;
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h 2018-07-03 09:01:31.079962411 +0100
> +++ gcc/tree-vectorizer.h 2018-07-03 09:02:36.567413531 +0100
> @@ -899,6 +899,21 @@ typedef struct _stmt_vec_info {
>
> /* The number of scalar stmt references from active SLP instances. */
> unsigned int num_slp_uses;
> +
> + /* If nonzero, the lhs of the statement could be truncated to this
> + many bits without affecting any users of the result. */
> + unsigned int min_output_precision;
> +
> + /* If nonzero, all non-boolean input operands have the same precision,
> + and they could each be truncated to this many bits without changing
> + the result. */
> + unsigned int min_input_precision;
> +
> + /* If OPERATION_BITS is nonzero, the statement could be performed on
> + an integer with the sign and number of bits given by OPERATION_SIGN
> + and OPERATION_BITS without changing the result. */
> + unsigned int operation_precision;
> + signop operation_sign;
> } *stmt_vec_info;
>
> /* Information about a gather/scatter call. */
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c 2018-07-03 09:01:31.035962780 +0100
> +++ gcc/tree-vect-patterns.c 2018-07-03 09:02:36.567413531 +0100
> @@ -47,6 +47,40 @@ Software Foundation; either version 3, o
> #include "omp-simd-clone.h"
> #include "predict.h"
>
> +/* Return true if we have a useful VR_RANGE range for VAR, storing it
> + in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
> +
> +static bool
> +vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> +{
> + value_range_type vr_type = get_range_info (var, min_value, max_value);
> + wide_int nonzero = get_nonzero_bits (var);
> + signop sgn = TYPE_SIGN (TREE_TYPE (var));
> + if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
> + nonzero, sgn) == VR_RANGE)
> + {
> + if (dump_enabled_p ())
> + {
> + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> + dump_printf (MSG_NOTE, " has range [");
> + dump_hex (MSG_NOTE, *min_value);
> + dump_printf (MSG_NOTE, ", ");
> + dump_hex (MSG_NOTE, *max_value);
> + dump_printf (MSG_NOTE, "]\n");
> + }
> + return true;
> + }
> + else
> + {
> + if (dump_enabled_p ())
> + {
> + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
> + dump_printf (MSG_NOTE, " has no range info\n");
> + }
> + return false;
> + }
> +}
> +
> /* Report that we've found an instance of pattern PATTERN in
> statement STMT. */
>
> @@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree ot
> return true;
> }
>
> -/* Check whether STMT2 is in the same loop or basic block as STMT1.
> - Which of the two applies depends on whether we're currently doing
> - loop-based or basic-block-based vectorization, as determined by
> - the vinfo_for_stmt for STMT1 (which must be defined).
> -
> - If this returns true, vinfo_for_stmt for STMT2 is guaranteed
> - to be defined as well. */
> -
> -static bool
> -vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> -{
> - stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> - return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> -}
> -
> -/* If the LHS of DEF_STMT has a single use, and that statement is
> - in the same loop or basic block, return it. */
> -
> -static gimple *
> -vect_single_imm_use (gimple *def_stmt)
> -{
> - tree lhs = gimple_assign_lhs (def_stmt);
> - use_operand_p use_p;
> - gimple *use_stmt;
> -
> - if (!single_imm_use (lhs, &use_p, &use_stmt))
> - return NULL;
> -
> - if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
> - return NULL;
> -
> - return use_stmt;
> -}
> -
> /* Round bit precision PRECISION up to a full element. */
>
> static unsigned int
> @@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_i
> is possible to convert OP' back to OP using a possible sign change
> followed by a possible promotion P. Return this OP', or null if OP is
> not a vectorizable SSA name. If there is a promotion P, describe its
> - input in UNPROM, otherwise describe OP' in UNPROM.
> + input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
> + is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
> + have more than one user.
>
> A successful return means that it is possible to go from OP' to OP
> via UNPROM. The cast from OP' to UNPROM is at most a sign change,
> @@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_i
>
> static tree
> vect_look_through_possible_promotion (vec_info *vinfo, tree op,
> - vect_unpromoted_value *unprom)
> + vect_unpromoted_value *unprom,
> + bool *single_use_p = NULL)
> {
> tree res = NULL_TREE;
> tree op_type = TREE_TYPE (op);
> @@ -420,7 +423,14 @@ vect_look_through_possible_promotion (ve
> if (!def_stmt)
> break;
> if (dt == vect_internal_def)
> - caster = vinfo_for_stmt (def_stmt);
> + {
> + caster = vinfo_for_stmt (def_stmt);
> + /* Ignore pattern statements, since we don't link uses for them. */
> + if (single_use_p
> + && !STMT_VINFO_RELATED_STMT (caster)
> + && !has_single_use (res))
> + *single_use_p = false;
> + }
> else
> caster = NULL;
> gassign *assign = dyn_cast <gassign *> (def_stmt);
> @@ -1371,363 +1381,318 @@ vect_recog_widen_sum_pattern (vec<gimple
> return pattern_stmt;
> }
>
> +/* Recognize cases in which an operation is performed in one type WTYPE
> + but could be done more efficiently in a narrower type NTYPE. For example,
> + if we have:
> +
> + ATYPE a; // narrower than NTYPE
> + BTYPE b; // narrower than NTYPE
> + WTYPE aw = (WTYPE) a;
> + WTYPE bw = (WTYPE) b;
> + WTYPE res = aw + bw; // only uses of aw and bw
> +
> + then it would be more efficient to do:
> +
> + NTYPE an = (NTYPE) a;
> + NTYPE bn = (NTYPE) b;
> + NTYPE resn = an + bn;
> + WTYPE res = (WTYPE) resn;
> +
> + Other situations include things like:
> +
> + ATYPE a; // NTYPE or narrower
> + WTYPE aw = (WTYPE) a;
> + WTYPE res = aw + b;
> +
> + when only "(NTYPE) res" is significant. In that case it's more efficient
> + to truncate "b" and do the operation on NTYPE instead:
> +
> + NTYPE an = (NTYPE) a;
> + NTYPE bn = (NTYPE) b; // truncation
> + NTYPE resn = an + bn;
> + WTYPE res = (WTYPE) resn;
> +
> + All users of "res" should then use "resn" instead, making the final
> + statement dead (not marked as relevant). The final statement is still
> + needed to maintain the type correctness of the IR.
> +
> + vect_determine_precisions has already determined the minimum
> + precison of the operation and the minimum precision required
> + by users of the result. */
>
> -/* Return TRUE if the operation in STMT can be performed on a smaller type.
> +static gimple *
> +vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> +{
> + gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> + if (!last_stmt)
> + return NULL;
>
> - Input:
> - STMT - a statement to check.
> - DEF - we support operations with two operands, one of which is constant.
> - The other operand can be defined by a demotion operation, or by a
> - previous statement in a sequence of over-promoted operations. In the
> - later case DEF is used to replace that operand. (It is defined by a
> - pattern statement we created for the previous statement in the
> - sequence).
> -
> - Input/output:
> - NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
> - NULL, it's the type of DEF.
> - STMTS - additional pattern statements. If a pattern statement (type
> - conversion) is created in this function, its original statement is
> - added to STMTS.
> + /* See whether we have found that this operation can be done on a
> + narrower type without changing its semantics. */
> + stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> + unsigned int new_precision = last_stmt_info->operation_precision;
> + if (!new_precision)
> + return NULL;
>
> - Output:
> - OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
> - operands to use in the new pattern statement for STMT (will be created
> - in vect_recog_over_widening_pattern ()).
> - NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
> - statements for STMT: the first one is a type promotion and the second
> - one is the operation itself. We return the type promotion statement
> - in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
> - the second pattern statement. */
> + vec_info *vinfo = last_stmt_info->vinfo;
> + tree lhs = gimple_assign_lhs (last_stmt);
> + tree type = TREE_TYPE (lhs);
> + tree_code code = gimple_assign_rhs_code (last_stmt);
> +
> + /* Keep the first operand of a COND_EXPR as-is: only the other two
> + operands are interesting. */
> + unsigned int first_op = (code == COND_EXPR ? 2 : 1);
>
> -static bool
> -vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
> - tree *op0, tree *op1, gimple **new_def_stmt,
> - vec<gimple *> *stmts)
> -{
> - enum tree_code code;
> - tree const_oprnd, oprnd;
> - tree interm_type = NULL_TREE, half_type, new_oprnd, type;
> - gimple *def_stmt, *new_stmt;
> - bool first = false;
> - bool promotion;
> + /* Check the operands. */
> + unsigned int nops = gimple_num_ops (last_stmt) - first_op;
> + auto_vec <vect_unpromoted_value, 3> unprom (nops);
> + unprom.quick_grow (nops);
> + unsigned int min_precision = 0;
> + bool single_use_p = false;
> + for (unsigned int i = 0; i < nops; ++i)
> + {
> + tree op = gimple_op (last_stmt, first_op + i);
> + if (TREE_CODE (op) == INTEGER_CST)
> + unprom[i].set_op (op, vect_constant_def);
> + else if (TREE_CODE (op) == SSA_NAME)
> + {
> + bool op_single_use_p = true;
> + if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
> + &op_single_use_p))
> + return NULL;
> + /* If:
>
> - *op0 = NULL_TREE;
> - *op1 = NULL_TREE;
> - *new_def_stmt = NULL;
> + (1) N bits of the result are needed;
> + (2) all inputs are widened from M<N bits; and
> + (3) one operand OP is a single-use SSA name
> +
> + we can shift the M->N widening from OP to the output
> + without changing the number or type of extensions involved.
> + This then reduces the number of copies of STMT_INFO.
> +
> + If instead of (3) more than one operand is a single-use SSA name,
> + shifting the extension to the output is even more of a win.
> +
> + If instead:
> +
> + (1) N bits of the result are needed;
> + (2) one operand OP2 is widened from M2<N bits;
> + (3) another operand OP1 is widened from M1<M2 bits; and
> + (4) both OP1 and OP2 are single-use
> +
> + the choice is between:
> +
> + (a) truncating OP2 to M1, doing the operation on M1,
> + and then widening the result to N
> +
> + (b) widening OP1 to M2, doing the operation on M2, and then
> + widening the result to N
> +
> + Both shift the M2->N widening of the inputs to the output.
> + (a) additionally shifts the M1->M2 widening to the output;
> + it requires fewer copies of STMT_INFO but requires an extra
> + M2->M1 truncation.
> +
> + Which is better will depend on the complexity and cost of
> + STMT_INFO, which is hard to predict at this stage. However,
> + a clear tie-breaker in favor of (b) is the fact that the
> + truncation in (a) increases the length of the operation chain.
> +
> + If instead of (4) only one of OP1 or OP2 is single-use,
> + (b) is still a win over doing the operation in N bits:
> + it still shifts the M2->N widening on the single-use operand
> + to the output and reduces the number of STMT_INFO copies.
> +
> + If neither operand is single-use then operating on fewer than
> + N bits might lead to more extensions overall. Whether it does
> + or not depends on global information about the vectorization
> + region, and whether that's a good trade-off would again
> + depend on the complexity and cost of the statements involved,
> + as well as things like register pressure that are not normally
> + modelled at this stage. We therefore ignore these cases
> + and just optimize the clear single-use wins above.
> +
> + Thus we take the maximum precision of the unpromoted operands
> + and record whether any operand is single-use. */
> + if (unprom[i].dt == vect_internal_def)
> + {
> + min_precision = MAX (min_precision,
> + TYPE_PRECISION (unprom[i].type));
> + single_use_p |= op_single_use_p;
> + }
> + }
> + }
>
> - if (!is_gimple_assign (stmt))
> - return false;
> + /* Although the operation could be done in operation_precision, we have
> + to balance that against introducing extra truncations or extensions.
> + Calculate the minimum precision that can be handled efficiently.
> +
> + The loop above determined that the operation could be handled
> + efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
> + extension from the inputs to the output without introducing more
> + instructions, and would reduce the number of instructions required
> + for STMT_INFO itself.
> +
> + vect_determine_precisions has also determined that the result only
> + needs min_output_precision bits. Truncating by a factor of N times
> + requires a tree of N - 1 instructions, so if TYPE is N times wider
> + than min_output_precision, doing the operation in TYPE and truncating
> + the result requires N + (N - 1) = 2N - 1 instructions per output vector.
> + In contrast:
> +
> + - truncating the input to a unary operation and doing the operation
> + in the new type requires at most N - 1 + 1 = N instructions per
> + output vector
> +
> + - doing the same for a binary operation requires at most
> + (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
> +
> + Both unary and binary operations require fewer instructions than
> + this if the operands were extended from a suitable truncated form.
> + Thus there is usually nothing to lose by doing operations in
> + min_output_precision bits, but there can be something to gain. */
> + if (!single_use_p)
> + min_precision = last_stmt_info->min_output_precision;
> + else
> + min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
>
> - code = gimple_assign_rhs_code (stmt);
> - if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
> - && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
> - return false;
> + /* Apply the minimum efficient precision we just calculated. */
> + if (new_precision < min_precision)
> + new_precision = min_precision;
> + if (new_precision >= TYPE_PRECISION (type))
> + return NULL;
>
> - oprnd = gimple_assign_rhs1 (stmt);
> - const_oprnd = gimple_assign_rhs2 (stmt);
> - type = gimple_expr_type (stmt);
> + vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
>
> - if (TREE_CODE (oprnd) != SSA_NAME
> - || TREE_CODE (const_oprnd) != INTEGER_CST)
> - return false;
> + *type_out = get_vectype_for_scalar_type (type);
> + if (!*type_out)
> + return NULL;
>
> - /* If oprnd has other uses besides that in stmt we cannot mark it
> - as being part of a pattern only. */
> - if (!has_single_use (oprnd))
> - return false;
> + /* We've found a viable pattern. Get the new type of the operation. */
> + bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
> + tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
> +
> + /* We specifically don't check here whether the target supports the
> + new operation, since it might be something that a later pattern
> + wants to rewrite anyway. If targets have a minimum element size
> + for some optabs, we should pattern-match smaller ops to larger ops
> + where beneficial. */
> + tree new_vectype = get_vectype_for_scalar_type (new_type);
> + if (!new_vectype)
> + return NULL;
>
> - /* If we are in the middle of a sequence, we use DEF from a previous
> - statement. Otherwise, OPRND has to be a result of type promotion. */
> - if (*new_type)
> - {
> - half_type = *new_type;
> - oprnd = def;
> - }
> - else
> + if (dump_enabled_p ())
> {
> - first = true;
> - if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
> - &promotion)
> - || !promotion
> - || !vect_same_loop_or_bb_p (stmt, def_stmt))
> - return false;
> + dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
> + dump_printf (MSG_NOTE, " to ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
> + dump_printf (MSG_NOTE, "\n");
> }
>
> - /* Can we perform the operation on a smaller type? */
> - switch (code)
> - {
> - case BIT_IOR_EXPR:
> - case BIT_XOR_EXPR:
> - case BIT_AND_EXPR:
> - if (!int_fits_type_p (const_oprnd, half_type))
> - {
> - /* HALF_TYPE is not enough. Try a bigger type if possible. */
> - if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> - return false;
> -
> - interm_type = build_nonstandard_integer_type (
> - TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> - if (!int_fits_type_p (const_oprnd, interm_type))
> - return false;
> - }
> -
> - break;
> -
> - case LSHIFT_EXPR:
> - /* Try intermediate type - HALF_TYPE is not enough for sure. */
> - if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> - return false;
> -
> - /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
> - (e.g., if the original value was char, the shift amount is at most 8
> - if we want to use short). */
> - if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
> - return false;
> -
> - interm_type = build_nonstandard_integer_type (
> - TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> -
> - if (!vect_supportable_shift (code, interm_type))
> - return false;
> -
> - break;
> -
> - case RSHIFT_EXPR:
> - if (vect_supportable_shift (code, half_type))
> - break;
> -
> - /* Try intermediate type - HALF_TYPE is not supported. */
> - if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
> - return false;
> -
> - interm_type = build_nonstandard_integer_type (
> - TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
> -
> - if (!vect_supportable_shift (code, interm_type))
> - return false;
> -
> - break;
> -
> - default:
> - gcc_unreachable ();
> - }
> -
> - /* There are four possible cases:
> - 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
> - the first statement in the sequence)
> - a. The original, HALF_TYPE, is not enough - we replace the promotion
> - from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
> - b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
> - promotion.
> - 2. OPRND is defined by a pattern statement we created.
> - a. Its type is not sufficient for the operation, we create a new stmt:
> - a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
> - this statement in NEW_DEF_STMT, and it is later put in
> - STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
> - b. OPRND is good to use in the new statement. */
> - if (first)
> - {
> - if (interm_type)
> - {
> - /* Replace the original type conversion HALF_TYPE->TYPE with
> - HALF_TYPE->INTERM_TYPE. */
> - if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
> - {
> - new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
> - /* Check if the already created pattern stmt is what we need. */
> - if (!is_gimple_assign (new_stmt)
> - || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
> - || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
> - return false;
> -
> - stmts->safe_push (def_stmt);
> - oprnd = gimple_assign_lhs (new_stmt);
> - }
> - else
> - {
> - /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
> - oprnd = gimple_assign_rhs1 (def_stmt);
> - new_oprnd = make_ssa_name (interm_type);
> - new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> - STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
> - stmts->safe_push (def_stmt);
> - oprnd = new_oprnd;
> - }
> - }
> - else
> - {
> - /* Retrieve the operand before the type promotion. */
> - oprnd = gimple_assign_rhs1 (def_stmt);
> - }
> - }
> - else
> - {
> - if (interm_type)
> - {
> - /* Create a type conversion HALF_TYPE->INTERM_TYPE. */
> - new_oprnd = make_ssa_name (interm_type);
> - new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
> - oprnd = new_oprnd;
> - *new_def_stmt = new_stmt;
> - }
> + /* Calculate the rhs operands for an operation on NEW_TYPE. */
> + STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
> + tree ops[3] = {};
> + for (unsigned int i = 1; i < first_op; ++i)
> + ops[i - 1] = gimple_op (last_stmt, i);
> + vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
> + new_type, &unprom[0], new_vectype);
> +
> + /* Use the operation to produce a result of type NEW_TYPE. */
> + tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> + gimple *pattern_stmt = gimple_build_assign (new_var, code,
> + ops[0], ops[1], ops[2]);
> + gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> - /* Otherwise, OPRND is already set. */
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "created pattern stmt: ");
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> }
>
> - if (interm_type)
> - *new_type = interm_type;
> - else
> - *new_type = half_type;
> + pattern_stmt = vect_convert_output (last_stmt_info, type,
> + pattern_stmt, new_vectype);
>
> - *op0 = oprnd;
> - *op1 = fold_convert (*new_type, const_oprnd);
> -
> - return true;
> + stmts->safe_push (last_stmt);
> + return pattern_stmt;
> }
>
> +/* Recognize cases in which the input to a cast is wider than its
> + output, and the input is fed by a widening operation. Fold this
> + by removing the unnecessary intermediate widening. E.g.:
>
> -/* Try to find a statement or a sequence of statements that can be performed
> - on a smaller type:
> + unsigned char a;
> + unsigned int b = (unsigned int) a;
> + unsigned short c = (unsigned short) b;
>
> - type x_t;
> - TYPE x_T, res0_T, res1_T;
> - loop:
> - S1 x_t = *p;
> - S2 x_T = (TYPE) x_t;
> - S3 res0_T = op (x_T, C0);
> - S4 res1_T = op (res0_T, C1);
> - S5 ... = () res1_T; - type demotion
> -
> - where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
> - constants.
> - Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
> - be 'type' or some intermediate type. For now, we expect S5 to be a type
> - demotion operation. We also check that S3 and S4 have only one use. */
> + -->
>
> -static gimple *
> -vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
> -{
> - gimple *stmt = stmts->pop ();
> - gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
> - *use_stmt = NULL;
> - tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
> - tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
> - bool first;
> - tree type = NULL;
> -
> - first = true;
> - while (1)
> - {
> - if (!vinfo_for_stmt (stmt)
> - || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
> - return NULL;
> -
> - new_def_stmt = NULL;
> - if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
> - &op0, &op1, &new_def_stmt,
> - stmts))
> - {
> - if (first)
> - return NULL;
> - else
> - break;
> - }
> + unsigned short c = (unsigned short) a;
>
> - /* STMT can be performed on a smaller type. Check its uses. */
> - use_stmt = vect_single_imm_use (stmt);
> - if (!use_stmt || !is_gimple_assign (use_stmt))
> - return NULL;
> -
> - /* Create pattern statement for STMT. */
> - vectype = get_vectype_for_scalar_type (new_type);
> - if (!vectype)
> - return NULL;
> -
> - /* We want to collect all the statements for which we create pattern
> - statetments, except for the case when the last statement in the
> - sequence doesn't have a corresponding pattern statement. In such
> - case we associate the last pattern statement with the last statement
> - in the sequence. Therefore, we only add the original statement to
> - the list if we know that it is not the last. */
> - if (prev_stmt)
> - stmts->safe_push (prev_stmt);
> + Although this is rare in input IR, it is an expected side-effect
> + of the over-widening pattern above.
>
> - var = vect_recog_temp_ssa_var (new_type, NULL);
> - pattern_stmt
> - = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
> - STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
> - new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
> + This is beneficial also for integer-to-float conversions, if the
> + widened integer has more bits than the float, and if the unwidened
> + input doesn't. */
>
> - if (dump_enabled_p ())
> - {
> - dump_printf_loc (MSG_NOTE, vect_location,
> - "created pattern stmt: ");
> - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
> - }
> +static gimple *
> +vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
> +{
> + /* Check for a cast, including an integer-to-float conversion. */
> + gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
> + if (!last_stmt)
> + return NULL;
> + tree_code code = gimple_assign_rhs_code (last_stmt);
> + if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
> + return NULL;
>
> - type = gimple_expr_type (stmt);
> - prev_stmt = stmt;
> - stmt = use_stmt;
> -
> - first = false;
> - }
> -
> - /* We got a sequence. We expect it to end with a type demotion operation.
> - Otherwise, we quit (for now). There are three possible cases: the
> - conversion is to NEW_TYPE (we don't do anything), the conversion is to
> - a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
> - NEW_TYPE differs (we create a new conversion statement). */
> - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
> - {
> - use_lhs = gimple_assign_lhs (use_stmt);
> - use_type = TREE_TYPE (use_lhs);
> - /* Support only type demotion or signedess change. */
> - if (!INTEGRAL_TYPE_P (use_type)
> - || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
> - return NULL;
> + /* Make sure that the rhs is a scalar with a natural bitsize. */
> + tree lhs = gimple_assign_lhs (last_stmt);
> + if (!lhs)
> + return NULL;
> + tree lhs_type = TREE_TYPE (lhs);
> + scalar_mode lhs_mode;
> + if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
> + || !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
> + return NULL;
>
> - /* Check that NEW_TYPE is not bigger than the conversion result. */
> - if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
> - return NULL;
> + /* Check for a narrowing operation (from a vector point of view). */
> + tree rhs = gimple_assign_rhs1 (last_stmt);
> + tree rhs_type = TREE_TYPE (rhs);
> + if (!INTEGRAL_TYPE_P (rhs_type)
> + || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
> + || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
> + return NULL;
>
> - if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
> - || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
> - {
> - *type_out = get_vectype_for_scalar_type (use_type);
> - if (!*type_out)
> - return NULL;
> + /* Try to find an unpromoted input. */
> + stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
> + vec_info *vinfo = last_stmt_info->vinfo;
> + vect_unpromoted_value unprom;
> + if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
> + || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
> + return NULL;
>
> - /* Create NEW_TYPE->USE_TYPE conversion. */
> - new_oprnd = make_ssa_name (use_type);
> - pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
> - STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
> -
> - /* We created a pattern statement for the last statement in the
> - sequence, so we don't need to associate it with the pattern
> - statement created for PREV_STMT. Therefore, we add PREV_STMT
> - to the list in order to mark it later in vect_pattern_recog_1. */
> - if (prev_stmt)
> - stmts->safe_push (prev_stmt);
> - }
> - else
> - {
> - if (prev_stmt)
> - STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
> - = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
> + /* If the bits above RHS_TYPE matter, make sure that they're the
> + same when extending from UNPROM as they are when extending from RHS. */
> + if (!INTEGRAL_TYPE_P (lhs_type)
> + && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
> + return NULL;
>
> - *type_out = vectype;
> - }
> + /* We can get the same result by casting UNPROM directly, to avoid
> + the unnecessary widening and narrowing. */
> + vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
>
> - stmts->safe_push (use_stmt);
> - }
> - else
> - /* TODO: support general case, create a conversion to the correct type. */
> + *type_out = get_vectype_for_scalar_type (lhs_type);
> + if (!*type_out)
> return NULL;
>
> - /* Pattern detected. */
> - vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
> + tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> + gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op);
> + gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>
> + stmts->safe_push (last_stmt);
> return pattern_stmt;
> }
>
> @@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<g
> return pattern_stmt;
> }
>
> +/* Return true if TYPE is a non-boolean integer type. These are the types
> + that we want to consider for narrowing. */
> +
> +static bool
> +vect_narrowable_type_p (tree type)
> +{
> + return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
> +}
> +
> +/* Return true if the operation given by CODE can be truncated to N bits
> + when only N bits of the output are needed. This is only true if bit N+1
> + of the inputs has no effect on the low N bits of the result. */
> +
> +static bool
> +vect_truncatable_operation_p (tree_code code)
> +{
> + switch (code)
> + {
> + case PLUS_EXPR:
> + case MINUS_EXPR:
> + case MULT_EXPR:
> + case BIT_AND_EXPR:
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + case COND_EXPR:
> + return true;
> +
> + default:
> + return false;
> + }
> +}
> +
> +/* Record that STMT_INFO could be changed from operating on TYPE to
> + operating on a type with the precision and sign given by PRECISION
> + and SIGN respectively. PRECISION is an arbitrary bit precision;
> + it might not be a whole number of bytes. */
> +
> +static void
> +vect_set_operation_type (stmt_vec_info stmt_info, tree type,
> + unsigned int precision, signop sign)
> +{
> + /* Round the precision up to a whole number of bytes. */
> + precision = vect_element_precision (precision);
> + if (precision < TYPE_PRECISION (type)
> + && (!stmt_info->operation_precision
> + || stmt_info->operation_precision > precision))
> + {
> + stmt_info->operation_precision = precision;
> + stmt_info->operation_sign = sign;
> + }
> +}
> +
> +/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
> + non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
> + is an arbitrary bit precision; it might not be a whole number of bytes. */
> +
> +static void
> +vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
> + unsigned int min_input_precision)
> +{
> + /* This operation in isolation only requires the inputs to have
> + MIN_INPUT_PRECISION of precision, However, that doesn't mean
> + that MIN_INPUT_PRECISION is a natural precision for the chain
> + as a whole. E.g. consider something like:
> +
> + unsigned short *x, *y;
> + *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> + The right shift can be done on unsigned chars, and only requires the
> + result of "*x & 0xf0" to be done on unsigned chars. But taking that
> + approach would mean turning a natural chain of single-vector unsigned
> + short operations into one that truncates "*x" and then extends
> + "(*x & 0xf0) >> 4", with two vectors for each unsigned short
> + operation and one vector for each unsigned char operation.
> + This would be a significant pessimization.
> +
> + Instead only propagate the maximum of this precision and the precision
> + required by the users of the result. This means that we don't pessimize
> + the case above but continue to optimize things like:
> +
> + unsigned char *y;
> + unsigned short *x;
> + *y = ((*x & 0xf0) >> 4) | (*y << 4);
> +
> + Here we would truncate two vectors of *x to a single vector of
> + unsigned chars and use single-vector unsigned char operations for
> + everything else, rather than doing two unsigned short copies of
> + "(*x & 0xf0) >> 4" and then truncating the result. */
> + min_input_precision = MAX (min_input_precision,
> + stmt_info->min_output_precision);
> +
> + if (min_input_precision < TYPE_PRECISION (type)
> + && (!stmt_info->min_input_precision
> + || stmt_info->min_input_precision > min_input_precision))
> + stmt_info->min_input_precision = min_input_precision;
> +}
> +
> +/* Subroutine of vect_determine_min_output_precision. Return true if
> + we can calculate a reduced number of output bits for STMT_INFO,
> + whose result is LHS. */
> +
> +static bool
> +vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
> +{
> + /* Take the maximum precision required by users of the result. */
> + unsigned int precision = 0;
> + imm_use_iterator iter;
> + use_operand_p use;
> + FOR_EACH_IMM_USE_FAST (use, iter, lhs)
> + {
> + gimple *use_stmt = USE_STMT (use);
> + if (is_gimple_debug (use_stmt))
> + continue;
> + if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
> + return false;
> + stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
> + if (!use_stmt_info->min_input_precision)
> + return false;
> + precision = MAX (precision, use_stmt_info->min_input_precision);
> + }
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
> + precision);
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
> + dump_printf (MSG_NOTE, " are significant\n");
> + }
> + stmt_info->min_output_precision = precision;
> + return true;
> +}
> +
> +/* Calculate min_output_precision for STMT_INFO. */
> +
> +static void
> +vect_determine_min_output_precision (stmt_vec_info stmt_info)
> +{
> + /* We're only interested in statements with a narrowable result. */
> + tree lhs = gimple_get_lhs (stmt_info->stmt);
> + if (!lhs
> + || TREE_CODE (lhs) != SSA_NAME
> + || !vect_narrowable_type_p (TREE_TYPE (lhs)))
> + return;
> +
> + if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
> + stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
> +}
> +
> +/* Use range information to decide whether STMT (described by STMT_INFO)
> + could be done in a narrower type. This is effectively a forward
> + propagation, since it uses context-independent information that applies
> + to all users of an SSA name. */
> +
> +static void
> +vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
> +{
> + tree lhs = gimple_assign_lhs (stmt);
> + if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> + return;
> +
> + tree type = TREE_TYPE (lhs);
> + if (!vect_narrowable_type_p (type))
> + return;
> +
> + /* First see whether we have any useful range information for the result. */
> + unsigned int precision = TYPE_PRECISION (type);
> + signop sign = TYPE_SIGN (type);
> + wide_int min_value, max_value;
> + if (!vect_get_range_info (lhs, &min_value, &max_value))
> + return;
> +
> + tree_code code = gimple_assign_rhs_code (stmt);
> + unsigned int nops = gimple_num_ops (stmt);
> +
> + if (!vect_truncatable_operation_p (code))
> + /* Check that all relevant input operands are compatible, and update
> + [MIN_VALUE, MAX_VALUE] to include their ranges. */
> + for (unsigned int i = 1; i < nops; ++i)
> + {
> + tree op = gimple_op (stmt, i);
> + if (TREE_CODE (op) == INTEGER_CST)
> + {
> + /* Don't require the integer to have RHS_TYPE (which it might
> + not for things like shift amounts, etc.), but do require it
> + to fit the type. */
> + if (!int_fits_type_p (op, type))
> + return;
> +
> + min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
> + max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
> + }
> + else if (TREE_CODE (op) == SSA_NAME)
> + {
> + /* Ignore codes that don't take uniform arguments. */
> + if (!types_compatible_p (TREE_TYPE (op), type))
> + return;
> +
> + wide_int op_min_value, op_max_value;
> + if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> + return;
> +
> + min_value = wi::min (min_value, op_min_value, sign);
> + max_value = wi::max (max_value, op_max_value, sign);
> + }
> + else
> + return;
> + }
> +
> + /* Try to switch signed types for unsigned types if we can.
> + This is better for two reasons. First, unsigned ops tend
> + to be cheaper than signed ops. Second, it means that we can
> + handle things like:
> +
> + signed char c;
> + int res = (int) c & 0xff00; // range [0x0000, 0xff00]
> +
> + as:
> +
> + signed char c;
> + unsigned short res_1 = (unsigned short) c & 0xff00;
> + int res = (int) res_1;
> +
> + where the intermediate result res_1 has unsigned rather than
> + signed type. */
> + if (sign == SIGNED && !wi::neg_p (min_value))
> + sign = UNSIGNED;
> +
> + /* See what precision is required for MIN_VALUE and MAX_VALUE. */
> + unsigned int precision1 = wi::min_precision (min_value, sign);
> + unsigned int precision2 = wi::min_precision (max_value, sign);
> + unsigned int value_precision = MAX (precision1, precision2);
> + if (value_precision >= precision)
> + return;
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> + " without loss of precision: ",
> + sign == SIGNED ? "signed" : "unsigned",
> + value_precision);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + }
> +
> + vect_set_operation_type (stmt_info, type, value_precision, sign);
> + vect_set_min_input_precision (stmt_info, type, value_precision);
> +}
> +
> +/* Use information about the users of STMT's result to decide whether
> + STMT (described by STMT_INFO) could be done in a narrower type.
> + This is effectively a backward propagation. */
> +
> +static void
> +vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
> +{
> + tree_code code = gimple_assign_rhs_code (stmt);
> + unsigned int opno = (code == COND_EXPR ? 2 : 1);
> + tree type = TREE_TYPE (gimple_op (stmt, opno));
> + if (!vect_narrowable_type_p (type))
> + return;
> +
> + unsigned int precision = TYPE_PRECISION (type);
> + unsigned int operation_precision, min_input_precision;
> + switch (code)
> + {
> + CASE_CONVERT:
> + /* Only the bits that contribute to the output matter. Don't change
> + the precision of the operation itself. */
> + operation_precision = precision;
> + min_input_precision = stmt_info->min_output_precision;
> + break;
> +
> + case LSHIFT_EXPR:
> + case RSHIFT_EXPR:
> + {
> + tree shift = gimple_assign_rhs2 (stmt);
> + if (TREE_CODE (shift) != INTEGER_CST
> + || !wi::ltu_p (wi::to_widest (shift), precision))
> + return;
> + unsigned int const_shift = TREE_INT_CST_LOW (shift);
> + if (code == LSHIFT_EXPR)
> + {
> + /* We need CONST_SHIFT fewer bits of the input. */
> + operation_precision = stmt_info->min_output_precision;
> + min_input_precision = (MAX (operation_precision, const_shift)
> + - const_shift);
> + }
> + else
> + {
> + /* We need CONST_SHIFT extra bits to do the operation. */
> + operation_precision = (stmt_info->min_output_precision
> + + const_shift);
> + min_input_precision = operation_precision;
> + }
> + break;
> + }
> +
> + default:
> + if (vect_truncatable_operation_p (code))
> + {
> + /* Input bit N has no effect on output bits N-1 and lower. */
> + operation_precision = stmt_info->min_output_precision;
> + min_input_precision = operation_precision;
> + break;
> + }
> + return;
> + }
> +
> + if (operation_precision < precision)
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
> + " without affecting users: ",
> + TYPE_UNSIGNED (type) ? "unsigned" : "signed",
> + operation_precision);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + }
> + vect_set_operation_type (stmt_info, type, operation_precision,
> + TYPE_SIGN (type));
> + }
> + vect_set_min_input_precision (stmt_info, type, min_input_precision);
> +}
> +
> +/* Handle vect_determine_precisions for STMT_INFO, given that we
> + have already done so for the users of its result. */
> +
> +void
> +vect_determine_stmt_precisions (stmt_vec_info stmt_info)
> +{
> + vect_determine_min_output_precision (stmt_info);
> + if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
> + {
> + vect_determine_precisions_from_range (stmt_info, stmt);
> + vect_determine_precisions_from_users (stmt_info, stmt);
> + }
> +}
> +
> +/* Walk backwards through the vectorizable region to determine the
> + values of these fields:
> +
> + - min_output_precision
> + - min_input_precision
> + - operation_precision
> + - operation_sign. */
> +
> +void
> +vect_determine_precisions (vec_info *vinfo)
> +{
> + DUMP_VECT_SCOPE ("vect_determine_precisions");
> +
> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> + {
> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> + unsigned int nbbs = loop->num_nodes;
> +
> + for (unsigned int i = 0; i < nbbs; i++)
> + {
> + basic_block bb = bbs[nbbs - i - 1];
> + for (gimple_stmt_iterator si = gsi_last_bb (bb);
> + !gsi_end_p (si); gsi_prev (&si))
> + vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
> + }
> + }
> + else
> + {
> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> + gimple_stmt_iterator si = bb_vinfo->region_end;
> + gimple *stmt;
> + do
> + {
> + if (!gsi_stmt (si))
> + si = gsi_last_bb (bb_vinfo->bb);
> + else
> + gsi_prev (&si);
> + stmt = gsi_stmt (si);
> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> + if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
> + vect_determine_stmt_precisions (stmt_info);
> + }
> + while (stmt != gsi_stmt (bb_vinfo->region_begin));
> + }
> +}
> +
> typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
>
> struct vect_recog_func
> @@ -4217,13 +4566,14 @@ struct vect_recog_func
> taken which means usually the more complex one needs to preceed the
> less comples onex (widen_sum only after dot_prod or sad for example). */
> static vect_recog_func vect_vect_recog_func_ptrs[] = {
> + { vect_recog_over_widening_pattern, "over_widening" },
> + { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
> { vect_recog_widen_mult_pattern, "widen_mult" },
> { vect_recog_dot_prod_pattern, "dot_prod" },
> { vect_recog_sad_pattern, "sad" },
> { vect_recog_widen_sum_pattern, "widen_sum" },
> { vect_recog_pow_pattern, "pow" },
> { vect_recog_widen_shift_pattern, "widen_shift" },
> - { vect_recog_over_widening_pattern, "over_widening" },
> { vect_recog_rotate_pattern, "rotate" },
> { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> { vect_recog_divmod_pattern, "divmod" },
> @@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo)
> unsigned int i, j;
> auto_vec<gimple *, 1> stmts_to_replace;
>
> + vect_determine_precisions (vinfo);
> +
> DUMP_VECT_SCOPE ("vect_pattern_recog");
>
> if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> Index: gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c 2016-11-11 17:07:36.776796115 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c 2018-07-03 09:02:36.567413531 +0100
> @@ -43,5 +43,5 @@ int main (void)
>
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
> -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
> +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2018-07-03 09:02:36.563413564 +0100
> @@ -62,8 +62,9 @@ int main (void)
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2018-07-03 09:02:36.563413564 +0100
> @@ -58,7 +58,9 @@ int main (void)
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c 2018-07-03 09:02:36.563413564 +0100
> @@ -57,7 +57,12 @@ int main (void)
> return 0;
> }
>
> -/* Final value stays in int, so no over-widening is detected at the moment. */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> +/* This is an over-widening even though the final result is still an int.
> + It's better to do one vector of ops on chars and then widen than to
> + widen and then do 4 vectors of ops on ints. */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c 2018-07-03 09:02:36.563413564 +0100
> @@ -57,7 +57,12 @@ int main (void)
> return 0;
> }
>
> -/* Final value stays in int, so no over-widening is detected at the moment. */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
> +/* This is an over-widening even though the final result is still an int.
> + It's better to do one vector of ops on chars and then widen than to
> + widen and then do 4 vectors of ops on ints. */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c 2018-07-03 09:02:36.563413564 +0100
> @@ -57,6 +57,9 @@ int main (void)
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2018-07-03 09:02:36.563413564 +0100
> @@ -59,7 +59,9 @@ int main (void)
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2018-07-03 09:02:36.563413564 +0100
> @@ -66,8 +66,9 @@ int main (void)
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-07-03 09:01:31.075962445 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2018-07-03 09:02:36.563413564 +0100
> @@ -62,7 +62,9 @@ int main (void)
> }
>
> /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
> +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
> ===================================================================
> --- /dev/null 2018-06-13 14:36:57.192460992 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-07-03 09:02:36.563413564 +0100
> @@ -0,0 +1,66 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-require-effective-target vect_pack_trunc } */
> +/* { dg-require-effective-target vect_unpack } */
> +
> +#include "tree-vect.h"
> +
> +/* Deliberate use of signed >>. */
> +#define DEF_LOOP(SIGNEDNESS) \
> + void __attribute__ ((noipa)) \
> + f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
> + SIGNEDNESS char *restrict b, \
> + SIGNEDNESS char *restrict c) \
> + { \
> + a[0] = (b[0] + c[0]) >> 1; \
> + a[1] = (b[1] + c[1]) >> 1; \
> + a[2] = (b[2] + c[2]) >> 1; \
> + a[3] = (b[3] + c[3]) >> 1; \
> + a[4] = (b[4] + c[4]) >> 1; \
> + a[5] = (b[5] + c[5]) >> 1; \
> + a[6] = (b[6] + c[6]) >> 1; \
> + a[7] = (b[7] + c[7]) >> 1; \
> + a[8] = (b[8] + c[8]) >> 1; \
> + a[9] = (b[9] + c[9]) >> 1; \
> + a[10] = (b[10] + c[10]) >> 1; \
> + a[11] = (b[11] + c[11]) >> 1; \
> + a[12] = (b[12] + c[12]) >> 1; \
> + a[13] = (b[13] + c[13]) >> 1; \
> + a[14] = (b[14] + c[14]) >> 1; \
> + a[15] = (b[15] + c[15]) >> 1; \
> + }
> +
> +DEF_LOOP (signed)
> +DEF_LOOP (unsigned)
> +
> +#define N 16
> +
> +#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \
> + { \
> + SIGNEDNESS char a[N], b[N], c[N]; \
> + for (int i = 0; i < N; ++i)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-07-03 20:08 ` Christophe Lyon
@ 2018-07-03 20:39 ` Rainer Orth
2018-07-04 7:18 ` Richard Sandiford
1 sibling, 0 replies; 10+ messages in thread
From: Rainer Orth @ 2018-07-03 20:39 UTC (permalink / raw)
To: Christophe Lyon; +Cc: Richard Biener, gcc Patches, Richard Sandiford
Hi Christophe,
> It seems the new bb-slp-over-widen tests lack a -fdump option:
> gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
> does not exist
> UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "basic block vectorized" 2
indeed, but that's not enough: adding
/* { dg-additional-options "-fdump-tree-vect-details" } */
to both affected tests (gcc.dg/vect/bb-slp-over-widen-[12].c) yields
FAIL: gcc.dg/vect/bb-slp-over-widen-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "basic block vectorized" 2
FAIL: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects scan-tree-dump-times vect "basic block vectorized" 2
on both 32 and 64-bit x86, and the dump contains:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:60:3: note: not vectorized: control flow in loop.
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:60:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:59:3: note: not vectorized: control flow in loop.
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:59:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c:55:1: note: vectorized 0 loops in function.
Rainer
--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [14/n] PR85694: Rework overwidening detection
2018-07-03 20:08 ` Christophe Lyon
2018-07-03 20:39 ` Rainer Orth
@ 2018-07-04 7:18 ` Richard Sandiford
1 sibling, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2018-07-04 7:18 UTC (permalink / raw)
To: Christophe Lyon; +Cc: Richard Biener, gcc Patches
Christophe Lyon <christophe.lyon@linaro.org> writes:
> On Tue, 3 Jul 2018 at 12:02, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> > This patch is the main part of PR85694. The aim is to recognise
> at least:
>> >> >
>> >> > signed char *a, *b, *c;
>> >> > ...
>> >> > for (int i = 0; i < 2048; i++)
>> >> > c[i] = (a[i] + b[i]) >> 1;
>> >> >
>> >> > as an over-widening pattern, since the addition and shift can be done
>> >> > on shorts rather than ints. However, it ended up being a lot more
>> >> > general than that.
>> >> >
>> >> > The current over-widening pattern detection is limited to a few simple
>> >> > cases: logical ops with immediate second operands, and shifts by a
>> >> > constant. These cases are enough for common pixel-format conversion
>> >> > and can be detected in a peephole way.
>> >> >
>> >> > The loop above requires two generalisations of the current code: support
>> >> > for addition as well as logical ops, and support for non-constant second
>> >> > operands. These are harder to detect in the same peephole way, so the
>> >> > patch tries to take a more global approach.
>> >> >
>> >> > The idea is to get information about the minimum operation width
>> >> > in two ways:
>> >> >
>> >> > (1) by using the range information attached to the SSA_NAMEs
>> >> > (effectively a forward walk, since the range info is
>> >> > context-independent).
>> >> >
>> >> > (2) by back-propagating the number of output bits required by
>> >> > users of the result.
>> >> >
>> >> > As explained in the comments, there's a balance to be struck between
>> >> > narrowing an individual operation and fitting in with the surrounding
>> >> > code. The approach is pretty conservative: if we could narrow an
>> >> > operation to N bits without changing its semantics, it's OK to do
> that if:
>> >> >
>> >> > - no operations later in the chain require more than N bits; or
>> >> >
>> >> > - all internally-defined inputs are extended from N bits or fewer,
>> >> > and at least one of them is single-use.
>> >> >
>> >> > See the comments for the rationale.
>> >> >
>> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> >> > since the code seemed more readable without.
>> >> >
>> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>> >>
>> >> Here's a version rebased on top of current trunk. Changes from last time:
>> >>
>> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
>> >> prototype
>> >>
>> >> - fix a typo in a comment
>> >>
>> >> - use vect_element_precision from the new version of 12/n.
>> >>
>> >> Tested as before. OK to install?
>> >
>> > OK.
>>
>> Thanks. For the record, here's what I installed (updated on top of
>> Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).
>>
>> Richard
>>
> Hi,
>
> It seems the new bb-slp-over-widen tests lack a -fdump option:
> gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
> does not exist
> UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "basic block vectorized" 2
I've applied the following as obvious.
Richard
2018-07-04 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.dg/vect/bb-slp-over-widen-1.c: Fix name of dump file for
final scan test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-07-03 10:59:30.480481417 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-07-04 08:16:36.210113069 +0100
@@ -63,4 +63,4 @@ main (void)
/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-07-03 10:59:30.480481417 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-07-04 08:16:36.210113069 +0100
@@ -62,4 +62,4 @@ main (void)
/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-07-04 7:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-20 10:37 [14/n] PR85694: Rework overwidening detection Richard Sandiford
2018-06-29 12:56 ` Richard Sandiford
2018-07-02 11:02 ` Christophe Lyon
2018-07-02 13:37 ` Richard Sandiford
2018-07-02 13:52 ` Christophe Lyon
2018-07-02 13:12 ` Richard Biener
2018-07-03 10:02 ` Richard Sandiford
2018-07-03 20:08 ` Christophe Lyon
2018-07-03 20:39 ` Rainer Orth
2018-07-04 7:18 ` Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).