From: Ilya Enkovich <enkovich.gnu@gmail.com>
To: Jeff Law <law@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, vec-tails 07/10] Support loop epilogue combining
Date: Tue, 28 Jun 2016 13:37:00 -0000 [thread overview]
Message-ID: <20160628122439.GB4143@msticlxl57.ims.intel.com> (raw)
In-Reply-To: <18ccae1a-30c3-c23c-e28f-287f9d41eaa0@redhat.com>
On 16 Jun 10:54, Jeff Law wrote:
> On 05/19/2016 01:44 PM, Ilya Enkovich wrote:
> >Hi,
> >
> >This patch introduces support for loop epilogue combining. This includes
> >support in cost estimation and all required changes required to mask
> >vectorized loop.
> >
> >Thanks,
> >Ilya
> >--
> >gcc/
> >
> >2016-05-19 Ilya Enkovich <ilya.enkovich@intel.com>
> >
> > * dbgcnt.def (vect_tail_combine): New.
> > * params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
> > * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
> > * tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
> > epilogue combined with loop body.
> > (vect_do_peeling_for_loop_bound): Likewise.
> > * tree-vect-loop.c Include alias.h and dbgcnt.h.
> > (vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
> > arg, compute number of iterations for which loop epilogue combining is
> > profitable.
> > (vect_generate_tmps_on_preheader): Support combined apilogue.
> > (vect_gen_ivs_for_masking): New.
> > (vect_get_mask_index_for_elems): New.
> > (vect_get_mask_index_for_type): New.
> > (vect_gen_loop_masks): New.
> > (vect_mask_reduction_stmt): New.
> > (vect_mask_mask_load_store_stmt): New.
> > (vect_mask_load_store_stmt): New.
> > (vect_combine_loop_epilogue): New.
> > (vect_transform_loop): Support combined apilogue.
> >
> >
> >diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> >index fab5879..b3c0668 100644
> >--- a/gcc/tree-vect-loop-manip.c
> >+++ b/gcc/tree-vect-loop-manip.c
> >@@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
> > bb_between_loops = new_exit_bb;
> > bb_after_second_loop = split_edge (single_exit (second_loop));
> >
> >- pre_condition =
> >- fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
> >- skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
> >- bb_after_second_loop, bb_before_first_loop,
> >- inverse_probability (second_guard_probability));
> >+ if (skip_second_after_first)
> >+ /* We can just redirect edge from bb_between_loops to
> >+ bb_after_second_loop but we have many code assuming
> >+ we have a guard after the first loop. So just make
> >+ always taken condtion. */
> >+ pre_condition = fold_build2 (EQ_EXPR, boolean_type_node, integer_zero_node,
> >+ integer_zero_node);
> This isn't ideal, but I don't think it's that big of an issue.
>
> >@@ -1758,8 +1772,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
> > basic_block preheader;
> > int loop_num;
> > int max_iter;
> >+ int bound2;
> > tree cond_expr = NULL_TREE;
> > gimple_seq cond_expr_stmt_list = NULL;
> >+ bool combine = LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo);
> >
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_NOTE, vect_location,
> >@@ -1769,12 +1785,13 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
> >
> > loop_num = loop->num;
> >
> >+ bound2 = combine ? th : LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> Can you document what the TH parameter is to the various routines that use
> it in tree-vect-loop-manip.c? I realize you didn't add it, but it would
> help anyone looking at this code in the future to know it's the threshold of
> iterations for vectorization without having to find it in other function
> comment headers ;-)
>
> That's pre-approved to go in immediately :-)
>
> >@@ -1803,7 +1820,11 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
> > max_iter = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 2
> > : LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 2;
> >- if (check_profitability)
> >+ /* When epilogue is combined only profitability
> >+ treshold matters. */
> s/treshold/threshold/
>
>
>
> > static void
> > vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
> > int *ret_min_profitable_niters,
> >- int *ret_min_profitable_estimate)
> >+ int *ret_min_profitable_estimate,
> >+ int *ret_min_profitable_combine_niters)
> I'm torn a bit here. There's all kinds of things missing/incomplete in the
> function comments throughout the vectorizer. And in some cases, like this
> one, the parameters are largely self-documenting. But we've also got coding
> standards that we'd like to adhere to.
>
> I don't think it's fair to require you to fix all these issues in the
> vectorizer (though if you wanted to, I'd fully support those an independent
> cleanups).
>
> Perhaps just document LOOP_VINFO with a generic comment about the ret_*
> parameters for this function rather than a comment for each ret_* parameter.
> Pre-approved for the trunk independent of the vec-tails work.
>
>
> >@@ -3728,6 +3784,77 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
> > min_profitable_estimate);
> >
> >+
> >+ unsigned combine_treshold
> >+ = PARAM_VALUE (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD);
> >+ /* Calculate profitability combining epilogue with the main loop.
> >+ We have a threshold for inside cost overhead (not applied
> >+ for low trip count loop case):
> >+ MIC * 100 < VIC * CT
> >+ Masked iteration should be better than a scalar prologue:
> >+ MIC + VIC < SIC * epilogue_niters */
> Can you double-check the whitespace formatting here. Where does the "100"
> come from and should it be a param?
I checked the formatting. We have 100 here because combine_treshold
is measured in percent. E.g. value 2 means iterations masking overhead
shouldn't exceed 2% of vector iteration cost.
>
>
> >@@ -6886,6 +7030,485 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
> > return;
> > }
> >
>
> >+
> >+/* Function vect_gen_loop_masks.
> >+
> >+ Create masks to mask a loop desvribed by LOOP_VINFO. Masks
> s/desvribed/described/
>
> >+ are created according to LOOP_VINFO_REQUIRED_MASKS and are stored
> >+ into MASKS vector.
> >+
> >+ Index of a mask in a vector is computed according to a number
> >+ of masks's elements. Masks are sorted by number of its elements
> >+ in descending order. Index 0 is used to access a mask with
> >+ current_vector_size elements. Among masks with the same number
> >+ of elements the one with lower index is used to mask iterations
> >+ with smaller iteration counter. Note that you may get NULL elements
> >+ for masks which are not required. Use vect_get_mask_index_for_elems
> >+ or vect_get_mask_index_for_type to access resulting vector. */
> >+
> >+static void
> >+vect_gen_loop_masks (loop_vec_info loop_vinfo, vec<tree> *masks)
> >+{
> >+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >+ edge pe = loop_preheader_edge (loop);
> >+ tree niters = LOOP_VINFO_NITERS (loop_vinfo);
> >+ unsigned min_mask_elems, max_mask_elems, nmasks;
> >+ unsigned iv_elems, cur_mask, prev_mask, cur_mask_elems;
> >+ auto_vec<tree> ivs;
> >+ tree vectype, mask_type;
> >+ tree vec_niters, vec_niters_val, mask;
> >+ gimple *stmt;
> >+ basic_block bb;
> >+ gimple_stmt_iterator gsi = gsi_after_labels (loop->header);
> >+ unsigned vec_size;
> >+
> >+ /* Create required IVs. */
> >+ vect_gen_ivs_for_masking (loop_vinfo, &ivs);
> >+ vectype = TREE_TYPE (ivs[0]);
> >+
> >+ vec_size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
> >+ iv_elems = TYPE_VECTOR_SUBPARTS (vectype);
> >+
> >+ /* Get a proper niter to build a vector. */
> >+ if (!is_gimple_val (niters))
> >+ {
> >+ gimple_seq seq = NULL;
> >+ niters = force_gimple_operand (niters, &seq, true, NULL);
> >+ gsi_insert_seq_on_edge_immediate (pe, seq);
> >+ }
> >+ /* We may need a type cast in case niter has a too small type
> >+ for generated IVs. */
> Nit. There should be vertical whitespace after the close brace and the
> comment for the next logical block of code. Can you do a scan over the
> patchkit looking for other instances where the vertical whitespace is
> needed.
>
> Generally, if you find that a blob of code needs a comment, then the comment
> and blob of code should have that vertical whitespace to visually separate
> it from everything else.
>
>
>
> >+/* Function vect_combine_loop_epilogue.
> >+
> >+ Combine loop epilogue with the main vectorized body. It requires
> >+ masking of memory accesses and reductions. */
> So you mask reductions, loads & stores. Is there anything else that we
> might potentially need to mask to combine the loop & epilogue via masking?
>
>
> I don't see anything particularly worrisome here either -- I have a slight
> concern about correctness issues with only masking loads/stores and
> reductions. But I will defer to your judgment on whether or not there's
> other stuff that we need to mask to combine the epilogue with the loop via
> masking.
We have to mask operations which may cause errors if executed speculatively.
For others we just ignore produced result. So we don't truly mask reductions
but fix-up their results. I assume memory accesses are only ones we have to
truly mask (plus non-const calls which are rejected now). For signalling
arithmetic I assumed we just don't vectorize it.
Basically we should act similar to if-conversion. I'll check if it has
restrictions I miss.
>
> Jeff
Here is an updated patch version.
Thanks,
Ilya
--
gcc/
2016-05-28 Ilya Enkovich <ilya.enkovich@intel.com>
* dbgcnt.def (vect_tail_combine): New.
* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
epilogue combined with loop body.
(vect_do_peeling_for_loop_bound): LIkewise.
(vect_do_peeling_for_alignment): ???
* tree-vect-loop.c Include alias.h and dbgcnt.h.
(vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
arg, compute number of iterations for which loop epilogue combining is
profitable.
(vect_generate_tmps_on_preheader): Support combined apilogue.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_combine_loop_epilogue): New.
(vect_transform_loop): Support combined apilogue.
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 78ddcc2..73c2966 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -192,4 +192,5 @@ DEBUG_COUNTER (treepre_insert)
DEBUG_COUNTER (tree_sra)
DEBUG_COUNTER (vect_loop)
DEBUG_COUNTER (vect_slp)
+DEBUG_COUNTER (vect_tail_combine)
DEBUG_COUNTER (dom_unreachable_edges)
diff --git a/gcc/params.def b/gcc/params.def
index 62a1e40..98d6c5a 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1220,6 +1220,11 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
"Maximum number of may-defs visited when devirtualizing "
"speculatively", 50, 0, 0)
+DEFPARAM (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD,
+ "vect-cost-increase-combine-threshold",
+ "Cost increase threshold to mask main loop for epilogue.",
+ 10, 0, 300)
+
/*
Local variables:
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index a902a50..26e0cc1 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4007,6 +4007,9 @@ vect_get_new_ssa_name (tree type, enum vect_var_kind var_kind, const char *name)
case vect_scalar_var:
prefix = "stmp";
break;
+ case vect_mask_var:
+ prefix = "mask";
+ break;
case vect_pointer_var:
prefix = "vectp";
break;
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index c26aa1d..7403686 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1195,6 +1195,7 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
int first_guard_probability = 2 * REG_BR_PROB_BASE / 3;
int second_guard_probability = 2 * REG_BR_PROB_BASE / 3;
int probability_of_second_loop;
+ bool skip_second_after_first = false;
if (!slpeel_can_duplicate_loop_p (loop, e))
return NULL;
@@ -1393,7 +1394,11 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
{
loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
- unsigned limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
+ unsigned limit = 0;
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ skip_second_after_first = true;
+ else
+ limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
limit = limit + 1;
if (check_profitability
@@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
bb_between_loops = new_exit_bb;
bb_after_second_loop = split_edge (single_exit (second_loop));
- pre_condition =
- fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
- skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
- bb_after_second_loop, bb_before_first_loop,
- inverse_probability (second_guard_probability));
+ if (skip_second_after_first)
+ /* We can just redirect edge from bb_between_loops to
+ bb_after_second_loop but we have many code assuming
+ we have a guard after the first loop. So just make
+ always taken condtion. */
+ pre_condition = fold_build2 (EQ_EXPR, boolean_type_node, integer_zero_node,
+ integer_zero_node);
+ else
+ pre_condition =
+ fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
+ skip_e
+ = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
+ bb_after_second_loop, bb_before_first_loop,
+ inverse_probability (second_guard_probability));
scale_loop_profile (second_loop, probability_of_second_loop, bound2);
slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
second_loop == new_loop, &new_exit_bb);
@@ -1762,8 +1776,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
basic_block preheader;
int loop_num;
int max_iter;
+ int bound2;
tree cond_expr = NULL_TREE;
gimple_seq cond_expr_stmt_list = NULL;
+ bool combine = LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
@@ -1773,12 +1789,13 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
loop_num = loop->num;
+ bound2 = combine ? th : LOOP_VINFO_VECT_FACTOR (loop_vinfo);
new_loop
= slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
&ratio_mult_vf_name, ni_name, false,
th, check_profitability,
cond_expr, cond_expr_stmt_list,
- 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+ 0, bound2);
gcc_assert (new_loop);
gcc_assert (loop_num == loop->num);
slpeel_checking_verify_cfg_after_peeling (loop, new_loop);
@@ -1807,7 +1824,12 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
max_iter = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
? LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 2
: LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 2;
- if (check_profitability)
+
+ /* When epilogue is combined only profitability
+ threshold matters. */
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ max_iter = (int) th - 1;
+ else if (check_profitability)
max_iter = MAX (max_iter, (int) th - 1);
record_niter_bound (new_loop, max_iter, false, true);
dump_printf (MSG_NOTE,
@@ -2044,7 +2066,8 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo, tree ni_name,
bound, 0);
gcc_assert (new_loop);
- slpeel_checking_verify_cfg_after_peeling (new_loop, loop);
+ if (!LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ slpeel_checking_verify_cfg_after_peeling (new_loop, loop);
/* For vectorization factor N, we need to copy at most N-1 values
for alignment and this means N-2 loopback edge executions. */
max_iter = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 2;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 41b9380..08fad82 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -50,6 +50,8 @@ along with GCC; see the file COPYING3. If not see
#include "gimple-fold.h"
#include "cgraph.h"
#include "tree-if-conv.h"
+#include "alias.h"
+#include "dbgcnt.h"
/* Loop Vectorization Pass.
@@ -149,7 +151,8 @@ along with GCC; see the file COPYING3. If not see
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
*/
-static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *);
+static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *,
+ int *);
/* Function vect_determine_vectorization_factor
@@ -2310,8 +2313,10 @@ start_over:
/* Analyze cost. Decide if worth while to vectorize. */
int min_profitable_estimate, min_profitable_iters;
+ int min_profitable_combine_iters;
vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters,
- &min_profitable_estimate);
+ &min_profitable_estimate,
+ &min_profitable_combine_iters);
if (min_profitable_iters < 0)
{
@@ -2420,6 +2425,52 @@ start_over:
gcc_assert (vectorization_factor
== (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+ if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+ {
+ LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
+ LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
+ }
+ else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+ && min_profitable_combine_iters >= 0)
+ {
+ if (((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && (LOOP_VINFO_INT_NITERS (loop_vinfo)
+ >= (unsigned) min_profitable_combine_iters))
+ || estimated_niter == -1
+ || estimated_niter >= min_profitable_combine_iters)
+ && dbg_cnt (vect_tail_combine))
+ {
+ LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
+ LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = true;
+
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Decided to combine loop with its epilogue.\n");
+
+ /* We need to adjust profitability check if combine
+ epilogue considering additional vector iteration
+ and profitable combine iterations. */
+ if ((int)(min_profitable_combine_iters + vectorization_factor)
+ > min_scalar_loop_bound)
+ {
+ LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
+ = (unsigned) min_profitable_combine_iters;
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Updated runtime profitability treshold: %d\n",
+ min_profitable_combine_iters);
+
+ }
+ }
+ else
+ {
+ if (!LOOP_VINFO_NEED_MASKING (loop_vinfo) && dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Not combined loop with epilogue: iterations "
+ "count is too low (threshold is %d).\n",
+ min_profitable_combine_iters);
+ }
+ }
+
/* Ok to vectorize! */
return true;
@@ -3392,12 +3443,18 @@ vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
profitability check.
*RET_MIN_PROFITABLE_ESTIMATE is a profitability threshold to be used
- for static check against estimated number of iterations. */
+ for static check against estimated number of iterations.
+
+ *RET_MIN_PROFITABLE_COMBINE_NITERS is a cost model profitability threshold
+ of iterations for vectorization with combined loop epilogue. -1 means
+ combining is not profitable. Value may be used fo dynamic profitability
+ check. */
static void
vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
int *ret_min_profitable_niters,
- int *ret_min_profitable_estimate)
+ int *ret_min_profitable_estimate,
+ int *ret_min_profitable_combine_niters)
{
int min_profitable_iters;
int min_profitable_estimate;
@@ -3641,6 +3698,10 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
vec_prologue_cost);
dump_printf (MSG_NOTE, " Vector epilogue cost: %d\n",
vec_epilogue_cost);
+ dump_printf (MSG_NOTE, " Masking prologue cost: %d\n",
+ masking_prologue_cost);
+ dump_printf (MSG_NOTE, " Masking inside cost: %d\n",
+ masking_inside_cost);
dump_printf (MSG_NOTE, " Scalar iteration cost: %d\n",
scalar_single_iter_cost);
dump_printf (MSG_NOTE, " Scalar outside cost: %d\n",
@@ -3744,6 +3805,77 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
min_profitable_estimate);
*ret_min_profitable_estimate = min_profitable_estimate;
+
+ *ret_min_profitable_combine_niters = -1;
+
+ /* Don't try to vectorize epilogue of epilogue. */
+ if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+ return;
+
+ if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+ {
+ if (flag_vect_epilogue_cost_model == VECT_COST_MODEL_UNLIMITED)
+ {
+ if (flag_tree_vectorize_epilogues & VECT_EPILOGUE_COMBINE)
+ *ret_min_profitable_combine_niters = 0;
+ return;
+ }
+
+ unsigned combine_treshold
+ = PARAM_VALUE (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD);
+ /* Calculate profitability combining epilogue with the main loop.
+ We have a threshold for inside cost overhead (not applied
+ for low trip count loop case):
+ MIC * 100 < VIC * CT
+ Masked iteration should be better than a scalar prologue:
+ MIC + VIC < SIC * epilogue_niters */
+ if (masking_inside_cost * 100 >= vec_inside_cost * combine_treshold)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Combining loop with epilogue is not "
+ "profitable.\n");
+ dump_printf_loc (MSG_NOTE, vect_location,
+ " Combining overhead %d%% exceeds "
+ "treshold %d%%.\n",
+ masking_inside_cost * 100 / vec_inside_cost,
+ combine_treshold);
+ }
+ *ret_min_profitable_combine_niters = -1;
+ }
+ else if ((int)(masking_inside_cost + vec_inside_cost)
+ >= scalar_single_iter_cost * peel_iters_epilogue)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Combining loop with epilogue is not "
+ "profitable.\n");
+ dump_printf_loc (MSG_NOTE, vect_location,
+ " Scalar epilogue is faster than a "
+ "single masked iteration.\n");
+ }
+ *ret_min_profitable_combine_niters = -1;
+ }
+ else if (flag_tree_vectorize_epilogues & VECT_EPILOGUE_COMBINE)
+ {
+ int inside_cost = vec_inside_cost + masking_inside_cost;
+ int outside_cost = vec_outside_cost + masking_prologue_cost;
+ int profitable_iters = ((outside_cost - scalar_outside_cost) * vf
+ - inside_cost * peel_iters_prologue
+ - inside_cost * peel_iters_epilogue)
+ / ((scalar_single_iter_cost * vf)
+ - inside_cost);
+
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Combinig loop with epilogue "
+ "pofitability treshold = %d\n",
+ profitable_iters);
+ *ret_min_profitable_combine_niters = profitable_iters;
+ }
+ }
}
/* Writes into SEL a mask for a vec_perm, equivalent to a vec_shr by OFFSET
@@ -6852,20 +6984,37 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
else
ni_minus_gap_name = ni_name;
- /* Create: ratio = ni >> log2(vf) */
- /* ??? As we have ni == number of latch executions + 1, ni could
- have overflown to zero. So avoid computing ratio based on ni
- but compute it using the fact that we know ratio will be at least
- one, thus via (ni - vf) >> log2(vf) + 1. */
- ratio_name
- = fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
- fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
- fold_build2 (MINUS_EXPR, TREE_TYPE (ni_name),
- ni_minus_gap_name,
- build_int_cst
- (TREE_TYPE (ni_name), vf)),
- log_vf),
- build_int_cst (TREE_TYPE (ni_name), 1));
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ {
+ /* Create ni + (vf-1) >> log2(vf) if epilogue is combined with loop. */
+ gcc_assert (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
+ ratio_name
+ = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
+ fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
+ ni_name,
+ build_int_cst (TREE_TYPE (ni_name),
+ vf - 1)),
+ log_vf);
+ }
+ else
+ {
+ /* Create: ratio = ni >> log2(vf) */
+ /* ??? As we have ni == number of latch executions + 1, ni could
+ have overflown to zero. So avoid computing ratio based on ni
+ but compute it using the fact that we know ratio will be at least
+ one, thus via (ni - vf) >> log2(vf) + 1. */
+ ratio_name
+ = fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
+ fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
+ fold_build2 (MINUS_EXPR,
+ TREE_TYPE (ni_name),
+ ni_minus_gap_name,
+ build_int_cst
+ (TREE_TYPE (ni_name), vf)),
+ log_vf),
+ build_int_cst (TREE_TYPE (ni_name), 1));
+ }
+
if (!is_gimple_val (ratio_name))
{
var = create_tmp_var (TREE_TYPE (ni_name), "bnd");
@@ -6895,6 +7044,489 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
return;
}
+/* Function vect_gen_ivs_for_masking.
+
+ Create IVs to be used for masks computation to mask loop described
+ by LOOP_VINFO. Created IVs are stored in IVS vector. .
+
+ Initial IV values is {0, 1, ..., VF - 1} (probably split into several
+ vectors, in this case IVS's elements with lower index hold IV with
+ smaller numbers). IV step is {VF, VF, ..., VF}. VF is a used
+ vectorization factor. */
+
+static void
+vect_gen_ivs_for_masking (loop_vec_info loop_vinfo, vec<tree> *ivs)
+{
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ tree vectype = vect_get_masking_iv_type (loop_vinfo);
+ tree type = TREE_TYPE (vectype);
+ int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+ unsigned elems = TYPE_VECTOR_SUBPARTS (vectype);
+ int ncopies = vf / elems;
+ int i, k;
+ tree iv, init_val, step_val;
+ bool insert_after;
+ gimple_stmt_iterator gsi;
+ tree *vtemp;
+
+ /* Create {VF, ..., VF} vector constant. */
+ step_val = build_vector_from_val (vectype, build_int_cst (type, vf));
+
+ vtemp = XALLOCAVEC (tree, vf);
+ for (i = 0; i < ncopies; i++)
+ {
+ /* Create initial IV value. */
+ for (k = 0; k < vf; k++)
+ vtemp[k] = build_int_cst (type, k + i * elems);
+ init_val = build_vector (vectype, vtemp);
+
+ /* Create an inductive variable including phi node. */
+ standard_iv_increment_position (loop, &gsi, &insert_after);
+ create_iv (init_val, step_val, NULL, loop, &gsi, insert_after,
+ &iv, NULL);
+ ivs->safe_push (iv);
+ }
+}
+
+/* Function vect_get_mask_index_for_elems.
+
+ A helper function to access masks vector. See vect_gen_loop_masks
+ for masks vector sorting description. Return index of the first
+ mask having MASK_ELEMS elements. */
+
+static inline unsigned
+vect_get_mask_index_for_elems (unsigned mask_elems)
+{
+ return current_vector_size / mask_elems - 1;
+}
+
+/* Function vect_get_mask_index_for_type.
+
+ A helper function to access masks vector. See vect_gen_loop_masks
+ for masks vector sorting description. Return index of the first
+ mask appropriate for VECTYPE. */
+
+static inline unsigned
+vect_get_mask_index_for_type (tree vectype)
+{
+ unsigned elems = TYPE_VECTOR_SUBPARTS (vectype);
+ return vect_get_mask_index_for_elems (elems);
+}
+
+/* Function vect_gen_loop_masks.
+
+ Create masks to mask a loop described by LOOP_VINFO. Masks
+ are created according to LOOP_VINFO_REQUIRED_MASKS and are stored
+ into MASKS vector.
+
+ Index of a mask in a vector is computed according to a number
+ of masks's elements. Masks are sorted by number of its elements
+ in descending order. Index 0 is used to access a mask with
+ current_vector_size elements. Among masks with the same number
+ of elements the one with lower index is used to mask iterations
+ with smaller iteration counter. Note that you may get NULL elements
+ for masks which are not required. Use vect_get_mask_index_for_elems
+ or vect_get_mask_index_for_type to access resulting vector. */
+
+static void
+vect_gen_loop_masks (loop_vec_info loop_vinfo, vec<tree> *masks)
+{
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ edge pe = loop_preheader_edge (loop);
+ tree niters = LOOP_VINFO_NITERS (loop_vinfo);
+ unsigned min_mask_elems, max_mask_elems, nmasks;
+ unsigned iv_elems, cur_mask, prev_mask, cur_mask_elems;
+ auto_vec<tree> ivs;
+ tree vectype, mask_type;
+ tree vec_niters, vec_niters_val, mask;
+ gimple *stmt;
+ basic_block bb;
+ gimple_stmt_iterator gsi = gsi_after_labels (loop->header);
+ unsigned vec_size;
+
+ /* Create required IVs. */
+ vect_gen_ivs_for_masking (loop_vinfo, &ivs);
+ vectype = TREE_TYPE (ivs[0]);
+
+ vec_size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
+ iv_elems = TYPE_VECTOR_SUBPARTS (vectype);
+
+ /* Get a proper niter to build a vector. */
+ if (!is_gimple_val (niters))
+ {
+ gimple_seq seq = NULL;
+ niters = force_gimple_operand (niters, &seq, true, NULL);
+ gsi_insert_seq_on_edge_immediate (pe, seq);
+ }
+
+ /* We may need a type cast in case niter has a too small type
+ for generated IVs. */
+ if (!types_compatible_p (TREE_TYPE (vectype), TREE_TYPE (niters)))
+ {
+ tree new_niters = make_temp_ssa_name (TREE_TYPE (vectype),
+ NULL, "niters");
+ stmt = gimple_build_assign (new_niters, CONVERT_EXPR, niters);
+ bb = gsi_insert_on_edge_immediate (pe, stmt);
+ gcc_assert (!bb);
+ niters = new_niters;
+ }
+
+ /* Create {NITERS, ..., NITERS} vector and put to SSA_NAME. */
+ vec_niters_val = build_vector_from_val (vectype, niters);
+ vec_niters = vect_get_new_ssa_name (vectype, vect_simple_var, "niters");
+ stmt = gimple_build_assign (vec_niters, vec_niters_val);
+ bb = gsi_insert_on_edge_immediate (pe, stmt);
+ gcc_assert (!bb);
+
+ /* Determine which masks we need to compute and how many. */
+ vect_get_extreme_masks (loop_vinfo, &min_mask_elems, &max_mask_elems);
+ nmasks = vect_get_mask_index_for_elems (MIN (min_mask_elems, iv_elems) / 2);
+ masks->safe_grow_cleared (nmasks);
+
+ /* Now create base masks through comparison IV < VEC_NITERS. */
+ mask_type = build_same_sized_truth_vector_type (vectype);
+ cur_mask = vect_get_mask_index_for_elems (iv_elems);
+ for (unsigned i = 0; i < ivs.length (); i++)
+ {
+ tree iv = ivs[i];
+ mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
+ stmt = gimple_build_assign (mask, LT_EXPR, iv, vec_niters);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ (*masks)[cur_mask++] = mask;
+ }
+
+ /* Create narrowed masks. */
+ cur_mask_elems = iv_elems;
+ nmasks = ivs.length ();
+ while (cur_mask_elems < max_mask_elems)
+ {
+ prev_mask = vect_get_mask_index_for_elems (cur_mask_elems);
+
+ cur_mask_elems <<= 1;
+ nmasks >>= 1;
+
+ cur_mask = vect_get_mask_index_for_elems (cur_mask_elems);
+
+ mask_type = build_truth_vector_type (cur_mask_elems, vec_size);
+
+ for (unsigned i = 0; i < nmasks; i++)
+ {
+ tree mask_low = (*masks)[prev_mask++];
+ tree mask_hi = (*masks)[prev_mask++];
+ mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
+ stmt = gimple_build_assign (mask, VEC_PACK_TRUNC_EXPR,
+ mask_low, mask_hi);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ (*masks)[cur_mask++] = mask;
+ }
+ }
+
+ /* Created widened masks. */
+ cur_mask_elems = iv_elems;
+ nmasks = ivs.length ();
+ while (cur_mask_elems > min_mask_elems)
+ {
+ prev_mask = vect_get_mask_index_for_elems (cur_mask_elems);
+
+ cur_mask_elems >>= 1;
+ nmasks <<= 1;
+
+ cur_mask = vect_get_mask_index_for_elems (cur_mask_elems);
+
+ mask_type = build_truth_vector_type (cur_mask_elems, vec_size);
+
+ for (unsigned i = 0; i < nmasks; i += 2)
+ {
+ tree orig_mask = (*masks)[prev_mask++];
+
+ mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
+ stmt = gimple_build_assign (mask, VEC_UNPACK_LO_EXPR, orig_mask);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ (*masks)[cur_mask++] = mask;
+
+ mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
+ stmt = gimple_build_assign (mask, VEC_UNPACK_HI_EXPR, orig_mask);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ (*masks)[cur_mask++] = mask;
+ }
+ }
+}
+
+/* Function vect_mask_reduction_stmt.
+
+ Mask given vectorized reduction statement STMT using
+ MASK. In case scalar reduction statement is vectorized
+ into several vector statements then PREV holds a
+ preceding vetor statement copy for STMT.
+
+ Masking is performed using VEC_COND_EXPR. E.g.
+
+ S1: r_1 = r_2 + d_3
+
+ is transformed into:
+
+ S1': r_4 = r_2 + d_3
+ S2': r_1 = VEC_COND_EXPR<MASK, r_4, r_2>
+
+ Return generated condition statement. */
+
+static gimple *
+vect_mask_reduction_stmt (gimple *stmt, tree mask, gimple *prev)
+{
+ gimple_stmt_iterator gsi;
+ tree vectype;
+ tree lhs, rhs, tmp;
+ gimple *new_stmt, *phi;
+
+ lhs = gimple_assign_lhs (stmt);
+ vectype = TREE_TYPE (lhs);
+
+ gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
+ == TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)));
+
+ /* Find operand RHS defined by PHI node. */
+ rhs = gimple_assign_rhs1 (stmt);
+ gcc_assert (TREE_CODE (rhs) == SSA_NAME);
+ phi = SSA_NAME_DEF_STMT (rhs);
+
+ if (phi != prev && gimple_code (phi) != GIMPLE_PHI)
+ {
+ rhs = gimple_assign_rhs2 (stmt);
+ gcc_assert (TREE_CODE (rhs) == SSA_NAME);
+ phi = SSA_NAME_DEF_STMT (rhs);
+ gcc_assert (phi == prev || gimple_code (phi) == GIMPLE_PHI);
+ }
+
+ /* Convert reduction stmt to ordinary assignment to TMP. */
+ tmp = vect_get_new_ssa_name (vectype, vect_simple_var, NULL);
+ gimple_assign_set_lhs (stmt, tmp);
+
+ /* Create VEC_COND_EXPR and insert it after STMT. */
+ new_stmt = gimple_build_assign (lhs, VEC_COND_EXPR, mask, tmp, rhs);
+ gsi = gsi_for_stmt (stmt);
+ gsi_insert_after (&gsi, new_stmt, GSI_SAME_STMT);
+
+ return new_stmt;
+}
+
+/* Function vect_mask_mask_load_store_stmt.
+
+ Mask given vectorized MASK_LOAD or MASK_STORE statement
+ STMT using MASK. Function replaces a mask used by STMT
+ with its conjunction with MASK. */
+
+static void
+vect_mask_mask_load_store_stmt (gimple *stmt, tree mask)
+{
+ gimple *new_stmt;
+ tree old_mask, new_mask;
+ gimple_stmt_iterator gsi;
+
+ gsi = gsi_for_stmt (stmt);
+ old_mask = gimple_call_arg (stmt, 2);
+
+ gcc_assert (types_compatible_p (TREE_TYPE (old_mask), TREE_TYPE (mask)));
+
+ new_mask = vect_get_new_ssa_name (TREE_TYPE (mask), vect_simple_var, NULL);
+ new_stmt = gimple_build_assign (new_mask, BIT_AND_EXPR, old_mask, mask);
+ gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+ gimple_call_set_arg (stmt, 2, new_mask);
+ update_stmt (stmt);
+}
+
+
+/* Function vect_mask_load_store_stmt.
+
+ Mask given vectorized load or store statement STMT using
+ MASK. DR is a data reference for a scalar memory access.
+ Assignment is transformed into MASK_LOAD or MASK_STORE
+ statement. SI is either an iterator pointing to STMT and
+ is to be updated or NULL. */
+
+static void
+vect_mask_load_store_stmt (gimple *stmt, tree vectype, tree mask,
+ data_reference *dr, gimple_stmt_iterator *si)
+{
+ tree mem, val, addr, ptr;
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ unsigned align, misalign;
+ tree elem_type = TREE_TYPE (vectype);
+ gimple *new_stmt;
+
+ gcc_assert (!si || gsi_stmt (*si) == stmt);
+
+ gsi = gsi_for_stmt (stmt);
+ if (gimple_store_p (stmt))
+ {
+ val = gimple_assign_rhs1 (stmt);
+ mem = gimple_assign_lhs (stmt);
+ }
+ else
+ {
+ val = gimple_assign_lhs (stmt);
+ mem = gimple_assign_rhs1 (stmt);
+ }
+
+ gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
+ == TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)));
+
+ addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (mem),
+ true, NULL_TREE, true,
+ GSI_SAME_STMT);
+
+ align = TYPE_ALIGN_UNIT (vectype);
+ if (aligned_access_p (dr))
+ misalign = 0;
+ else if (DR_MISALIGNMENT (dr) == -1)
+ {
+ align = TYPE_ALIGN_UNIT (elem_type);
+ misalign = 0;
+ }
+ else
+ misalign = DR_MISALIGNMENT (dr);
+ set_ptr_info_alignment (get_ptr_info (addr), align, misalign);
+ ptr = build_int_cst (reference_alias_ptr_type (mem),
+ misalign ? misalign & -misalign : align);
+
+ if (gimple_store_p (stmt))
+ new_stmt = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr,
+ mask, val);
+ else
+ {
+ new_stmt = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, ptr,
+ mask);
+ gimple_call_set_lhs (new_stmt, val);
+ }
+ gsi_replace (si ? si : &gsi, new_stmt, false);
+}
+
+/* Function vect_combine_loop_epilogue.
+
+ Combine loop epilogue with the main vectorized body. It requires
+ masking of memory accesses and reductions. */
+
+static void
+vect_combine_loop_epilogue (loop_vec_info loop_vinfo)
+{
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+ unsigned mask_no;
+ auto_vec<tree> masks;
+
+ vect_gen_loop_masks (loop_vinfo, &masks);
+
+ /* Convert reduction statements if any. */
+ for (unsigned i = 0; i < LOOP_VINFO_REDUCTIONS (loop_vinfo).length (); i++)
+ {
+ gimple *stmt = LOOP_VINFO_REDUCTIONS (loop_vinfo)[i];
+ gimple *prev_stmt = NULL;
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+
+ mask_no = vect_get_mask_index_for_type (STMT_VINFO_VECTYPE (stmt_info));
+
+ stmt = STMT_VINFO_VEC_STMT (stmt_info);
+ while (stmt)
+ {
+ prev_stmt = vect_mask_reduction_stmt (stmt, masks[mask_no++],
+ prev_stmt);
+ stmt_info = vinfo_for_stmt (stmt);
+ stmt = stmt_info ? STMT_VINFO_RELATED_STMT (stmt_info) : NULL;
+ }
+ }
+
+ /* Scan all loop statements to convert vector load/store including masked
+ form. */
+ for (unsigned i = 0; i < loop->num_nodes; i++)
+ {
+ basic_block bb = bbs[i];
+ for (gimple_stmt_iterator si = gsi_start_bb (bb);
+ !gsi_end_p (si); gsi_next (&si))
+ {
+ gimple *stmt = gsi_stmt (si);
+ stmt_vec_info stmt_info = NULL;
+ tree vectype = NULL;
+ data_reference *dr;
+
+ /* Mask load case. */
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+ && !VECTOR_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, 2))))
+ {
+ stmt_info = vinfo_for_stmt (stmt);
+ if (!STMT_VINFO_VEC_STMT (stmt_info))
+ continue;
+ stmt = STMT_VINFO_VEC_STMT (stmt_info);
+ vectype = STMT_VINFO_VECTYPE (stmt_info);
+ }
+ /* Mask store case. */
+ else if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && gimple_call_internal_fn (stmt) == IFN_MASK_STORE
+ && vinfo_for_stmt (stmt)
+ && STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (stmt)))
+ {
+ stmt_info = vinfo_for_stmt (stmt);
+ vectype = TREE_TYPE (gimple_call_arg (stmt, 2));
+ }
+ /* Load case. */
+ else if (gimple_assign_load_p (stmt)
+ && !VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
+ {
+ stmt_info = vinfo_for_stmt (stmt);
+
+ /* Skip vector loads. */
+ if (!STMT_VINFO_VEC_STMT (stmt_info))
+ continue;
+
+ /* Skip invariant loads. */
+ if (integer_zerop (nested_in_vect_loop_p (loop, stmt)
+ ? STMT_VINFO_DR_STEP (stmt_info)
+ : DR_STEP (STMT_VINFO_DATA_REF (stmt_info))))
+ continue;
+ stmt = STMT_VINFO_VEC_STMT (stmt_info);
+ vectype = STMT_VINFO_VECTYPE (stmt_info);
+ }
+ /* Store case. */
+ else if (gimple_code (stmt) == GIMPLE_ASSIGN
+ && gimple_store_p (stmt)
+ && vinfo_for_stmt (stmt)
+ && STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (stmt)))
+ {
+ stmt_info = vinfo_for_stmt (stmt);
+ vectype = STMT_VINFO_VECTYPE (stmt_info);
+ }
+ else
+ continue;
+
+ /* Skip hoisted out statements. */
+ if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+ continue;
+
+ mask_no = vect_get_mask_index_for_type (vectype);
+
+ dr = STMT_VINFO_DATA_REF (stmt_info);
+ while (stmt)
+ {
+ if (is_gimple_call (stmt))
+ vect_mask_mask_load_store_stmt (stmt, masks[mask_no++]);
+ else
+ vect_mask_load_store_stmt (stmt, vectype, masks[mask_no++], dr,
+ /* Have to update iterator only if
+ it points to stmt we mask. */
+ stmt == gsi_stmt (si) ? &si : NULL);
+
+ stmt_info = vinfo_for_stmt (stmt);
+ stmt = stmt_info ? STMT_VINFO_RELATED_STMT (stmt_info) : NULL;
+ }
+ }
+ }
+
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "=== Loop epilogue was combined ===\n");
+}
/* Function vect_transform_loop.
@@ -6936,7 +7568,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
run at least the vectorization factor number of times checking
is pointless, too. */
th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
- if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
+ if ((th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
+ || (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)
+ && th > 1))
&& !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
{
if (dump_enabled_p ())
@@ -6985,12 +7619,18 @@ vect_transform_loop (loop_vec_info loop_vinfo)
{
tree ratio_mult_vf;
if (!ni_name)
- ni_name = vect_build_loop_niters (loop_vinfo);
+ {
+ ni_name = vect_build_loop_niters (loop_vinfo);
+ LOOP_VINFO_NITERS (loop_vinfo) = ni_name;
+ }
vect_generate_tmps_on_preheader (loop_vinfo, ni_name, &ratio_mult_vf,
&ratio);
- epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name,
- ratio_mult_vf, th,
- check_profitability);
+ /* If epilogue is combined with main loop peeling is not needed. */
+ if (!LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)
+ || check_profitability)
+ epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name,
+ ratio_mult_vf, th,
+ check_profitability);
}
else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
@@ -6998,7 +7638,10 @@ vect_transform_loop (loop_vec_info loop_vinfo)
else
{
if (!ni_name)
- ni_name = vect_build_loop_niters (loop_vinfo);
+ {
+ ni_name = vect_build_loop_niters (loop_vinfo);
+ LOOP_VINFO_NITERS (loop_vinfo) = ni_name;
+ }
vect_generate_tmps_on_preheader (loop_vinfo, ni_name, NULL, &ratio);
}
@@ -7252,6 +7895,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
slpeel_make_loop_iterate_ntimes (loop, ratio);
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ vect_combine_loop_epilogue (loop_vinfo);
+
/* Reduce loop iterations by the vectorization factor. */
scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
expected_iterations / vectorization_factor);
@@ -7263,20 +7909,28 @@ vect_transform_loop (loop_vec_info loop_vinfo)
loop->nb_iterations_likely_upper_bound
= loop->nb_iterations_likely_upper_bound - 1;
}
- loop->nb_iterations_upper_bound
- = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
- vectorization_factor) - 1;
- loop->nb_iterations_likely_upper_bound
- = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + 1,
- vectorization_factor) - 1;
+
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ loop->nb_iterations_upper_bound
+ = wi::div_ceil (loop->nb_iterations_upper_bound + 1,
+ vectorization_factor, UNSIGNED) - 1;
+ else
+ loop->nb_iterations_upper_bound
+ = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
+ vectorization_factor) - 1;
if (loop->any_estimate)
{
- loop->nb_iterations_estimate
- = wi::udiv_floor (loop->nb_iterations_estimate, vectorization_factor);
- if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
- && loop->nb_iterations_estimate != 0)
- loop->nb_iterations_estimate = loop->nb_iterations_estimate - 1;
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+ loop->nb_iterations_estimate
+ = wi::div_ceil (loop->nb_iterations_estimate, vectorization_factor,
+ UNSIGNED);
+ else
+ loop->nb_iterations_estimate
+ = wi::udiv_floor (loop->nb_iterations_estimate, vectorization_factor);
+ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+ && loop->nb_iterations_estimate != 0)
+ loop->nb_iterations_estimate -= 1;
}
if (dump_enabled_p ())
next prev parent reply other threads:[~2016-06-28 12:27 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-19 19:46 Ilya Enkovich
2016-06-15 11:44 ` Richard Biener
2016-06-16 15:41 ` Ilya Enkovich
2016-06-16 15:51 ` Jeff Law
2016-06-16 16:03 ` Ilya Enkovich
2016-06-16 16:54 ` Jeff Law
2016-06-28 13:37 ` Ilya Enkovich [this message]
2016-06-28 14:16 ` Ilya Enkovich
2016-07-11 13:39 ` Ilya Enkovich
2016-07-14 22:04 ` Jeff Law
2016-07-20 14:40 ` Ilya Enkovich
2016-07-20 16:24 ` Jeff Law
2016-07-21 9:15 ` Ilya Enkovich
2016-07-21 16:34 ` Jeff Law
2016-07-22 11:36 ` Richard Biener
2016-07-25 18:01 ` Jeff Law
2016-07-25 18:33 ` Richard Biener
2016-07-25 21:08 ` Jeff Law
2016-07-26 9:57 ` Ilya Enkovich
2016-07-26 11:51 ` Richard Biener
2016-07-26 13:03 ` Ilya Enkovich
2016-07-26 13:05 ` Richard Biener
2016-07-26 15:26 ` Jeff Law
2016-07-26 15:38 ` Ilya Enkovich
2016-08-01 9:09 ` Ilya Enkovich
2016-08-01 16:10 ` Jeff Law
2016-09-02 14:46 ` Yuri Rumyantsev
2016-09-02 16:33 ` Bin.Cheng
2016-09-05 7:39 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160628122439.GB4143@msticlxl57.ims.intel.com \
--to=enkovich.gnu@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).