From: "Kewen.Lin" <linkw@linux.ibm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: Bill Schmidt <wschmidt@linux.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Richard Biener <richard.guenther@gmail.com>
Subject: [PATCH v5] vect/rs6000: Support vector with length cost modeling
Date: Fri, 31 Jul 2020 22:51:06 +0800 [thread overview]
Message-ID: <ee92269f-2cf6-546f-ff65-53f5970e5d82@linux.ibm.com> (raw)
In-Reply-To: <mptime9f53u.fsf@arm.com>
[-- Attachment #1: Type: text/plain, Size: 827 bytes --]
Hi Richard,
New version v5 is attached.
v5 main changes against v4:
1) use _stmt instead of _cnt to avoid confusion
2) factor out function vect_rgroup_iv_might_wrap_p
3) use generic scalar_stmt for min/max stmt
Does this look better? Thanks in advance!
BR,
Kewen
-----
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New
function.
(rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
modeling for vector with length.
(vect_rgroup_iv_might_wrap_p): New function, factored out from...
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this.
Update function comment.
* tree-vect-stmts.c (vect_gen_len): Update function comment.
* tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
[-- Attachment #2: cost_v5.diff --]
[-- Type: text/plain, Size: 10429 bytes --]
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 009afc5f894..86ef584e09b 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5177,6 +5177,34 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count,
return retval;
}
+/* For some target specific vectorization cost which can't be handled per stmt,
+ we check the requisite conditions and adjust the vectorization cost
+ accordingly if satisfied. One typical example is to model shift cost for
+ vector with length by counting number of required lengths under condition
+ LOOP_VINFO_FULLY_WITH_LENGTH_P. */
+
+static void
+rs6000_adjust_vect_cost_per_loop (rs6000_cost_data *data)
+{
+ struct loop *loop = data->loop_info;
+ gcc_assert (loop);
+ loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
+
+ if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+ rgroup_controls *rgc;
+ unsigned int num_vectors_m1;
+ unsigned int shift_cnt = 0;
+ FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+ if (rgc->type)
+ /* Each length needs one shift to fill into bits 0-7. */
+ shift_cnt += num_vectors_m1 + 1;
+
+ rs6000_add_stmt_cost (loop_vinfo, (void *) data, shift_cnt, scalar_stmt,
+ NULL, NULL_TREE, 0, vect_body);
+ }
+}
+
/* Implement targetm.vectorize.finish_cost. */
static void
@@ -5186,7 +5214,10 @@ rs6000_finish_cost (void *data, unsigned *prologue_cost,
rs6000_cost_data *cost_data = (rs6000_cost_data*) data;
if (cost_data->loop_info)
- rs6000_density_test (cost_data);
+ {
+ rs6000_adjust_vect_cost_per_loop (cost_data);
+ rs6000_density_test (cost_data);
+ }
/* Don't vectorize minimum-vectorization-factor, simple copy loops
that require versioning for any reason. The vectorization is at
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 490e7befea3..2a0d3b1840d 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -412,7 +412,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_controls *dest_rgm,
This means that we cannot guarantee that such an induction variable
would ever hit a value that produces a set of all-false masks or zero
- lengths for RGC. */
+ lengths for RGC.
+
+ Note that please check cost modeling whether needed to be updated in
+ function vect_estimate_min_profitable_iters if any changes. */
static tree
vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
@@ -711,8 +714,6 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
else
niters = gimple_convert (&preheader_seq, compare_type, niters);
- widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo);
-
/* Iterate over all the rgroups and fill in their controls. We could use
the first control from any rgroup for the loop condition; here we
arbitrarily pick the last. */
@@ -739,11 +740,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
/* See whether zero-based IV would ever generate all-false masks
or zero length before wrapping around. */
- unsigned nitems_per_iter = rgc->max_nscalars_per_iter * rgc->factor;
- bool might_wrap_p
- = (iv_limit == -1
- || (wi::min_precision (iv_limit * nitems_per_iter, UNSIGNED)
- > compare_precision));
+ bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
/* Set up all controls for this group. */
test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 43ec4fb04d5..dea9ca6778f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3798,6 +3798,58 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
(void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1,
vector_stmt, NULL, NULL_TREE, 0, vect_body);
}
+ else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+ /* Referring to the functions vect_set_loop_condition_partial_vectors
+ and vect_set_loop_controls_directly, we need to generate each
+ length in the prologue and in the loop body if required. Although
+ there are some possible optimizations, we consider the worst case
+ here. */
+
+ bool niters_known_p = LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo);
+ bool need_iterate_p
+ = (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+ && !vect_known_niters_smaller_than_vf (loop_vinfo));
+
+ /* Calculate how many statements to be added. */
+ unsigned int prologue_stmt = 0;
+ unsigned int body_stmt = 0;
+
+ rgroup_controls *rgc;
+ unsigned int num_vectors_m1;
+ FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+ if (rgc->type)
+ {
+ /* May need one SHIFT for nitems_total computation. */
+ unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor;
+ if (nitems != 1 && !niters_known_p)
+ prologue_stmt += 1;
+
+ /* May need one MAX and one MINUS for wrap around. */
+ if (vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc))
+ prologue_stmt += 2;
+
+ /* Need one MAX and one MINUS for each batch limit excepting for
+ the 1st one. */
+ prologue_stmt += num_vectors_m1 * 2;
+
+ unsigned int num_vectors = num_vectors_m1 + 1;
+
+ /* Need to set up lengths in prologue, only one MIN required
+ for each since start index is zero. */
+ prologue_stmt += num_vectors;
+
+ /* Each may need two MINs and one MINUS to update lengths in body
+ for next iteration. */
+ if (need_iterate_p)
+ body_stmt += 3 * num_vectors;
+ }
+
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, prologue_stmt,
+ scalar_stmt, NULL, NULL_TREE, 0, vect_prologue);
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, body_stmt,
+ scalar_stmt, NULL, NULL_TREE, 0, vect_body);
+ }
/* FORNOW: The scalar outside cost is incremented in one of the
following ways:
@@ -3932,8 +3984,8 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
}
/* ??? The "if" arm is written to handle all cases; see below for what
- we would do for !LOOP_VINFO_FULLY_MASKED_P. */
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P. */
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
/* Rewriting the condition above in terms of the number of
vector iterations (vniters) rather than the number of
@@ -3960,7 +4012,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
dump_printf (MSG_NOTE, " Minimum number of vector iterations: %d\n",
min_vec_niters);
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
/* Now that we know the minimum number of vector iterations,
find the minimum niters for which the scalar cost is larger:
@@ -4015,6 +4067,10 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
&& min_profitable_iters < (assumed_vf + peel_iters_prologue))
/* We want the vectorized loop to execute at least once. */
min_profitable_iters = assumed_vf + peel_iters_prologue;
+ else if (min_profitable_iters < peel_iters_prologue)
+ /* For LOOP_VINFO_USING_PARTIAL_VECTORS_P, we need to ensure the
+ vectorized loop to execute at least once. */
+ min_profitable_iters = peel_iters_prologue;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
@@ -4032,7 +4088,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
if (vec_outside_cost <= 0)
min_profitable_estimate = 0;
- else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
/* This is a repeat of the code above, but with + SOC rather
than - SOC. */
@@ -4044,7 +4100,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
if (outside_overhead > 0)
min_vec_niters = outside_overhead / saving_per_viter + 1;
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
int threshold = (vec_inside_cost * min_vec_niters
+ vec_outside_cost
@@ -9351,3 +9407,25 @@ vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo)
return iv_limit;
}
+/* For the given rgroup_controls RGC, check whether an induction variable
+ would ever hit a value that produces a set of all-false masks or zero
+ lengths before wrapping around. Return true if it's possible to wrap
+ around before hitting the desirable value, otherwise return false. */
+
+bool
+vect_rgroup_iv_might_wrap_p (loop_vec_info loop_vinfo, rgroup_controls *rgc)
+{
+ widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo);
+
+ if (iv_limit == -1)
+ return true;
+
+ tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
+ unsigned int compare_precision = TYPE_PRECISION (compare_type);
+ unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor;
+
+ if (wi::min_precision (iv_limit * nitems, UNSIGNED) > compare_precision)
+ return true;
+
+ return false;
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 31af46ae19c..8550a252f44 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12090,7 +12090,10 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
min_of_start_and_end = min (START_INDEX, END_INDEX);
left_len = END_INDEX - min_of_start_and_end;
rhs = min (left_len, LEN_LIMIT);
- LEN = rhs; */
+ LEN = rhs;
+
+ Note that please check cost modeling whether needed to be updated in
+ function vect_estimate_min_profitable_iters if any changes. */
gimple_seq
vect_gen_len (tree len, tree start_index, tree end_index, tree len_limit)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 5466c78c20b..7ab6d8234ac 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1960,6 +1960,7 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *,
/* In tree-vect-loop.c. */
extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo);
+bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
/* Used in tree-vect-loop-manip.c */
extern void determine_peel_for_niter (loop_vec_info);
/* Used in gimple-loop-interchange.c and tree-parloops.c. */
next prev parent reply other threads:[~2020-07-31 14:51 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 5:51 [PATCH] vect: " Kewen.Lin
2020-07-21 7:57 ` Richard Biener
2020-07-22 1:26 ` [PATCH v2] vect/rs6000: " Kewen.Lin
2020-07-22 6:38 ` Richard Biener
2020-07-22 7:08 ` Kewen.Lin
2020-07-22 9:11 ` Richard Sandiford
2020-07-22 15:48 ` [PATCH v3] " Kewen.Lin
2020-07-22 16:25 ` Kewen.Lin
2020-07-24 16:21 ` Richard Sandiford
2020-07-27 3:58 ` [PATCH v4] " Kewen.Lin
2020-07-27 13:40 ` Richard Sandiford
2020-07-28 8:36 ` Kewen.Lin
2020-07-31 11:03 ` Richard Sandiford
2020-07-31 11:20 ` Richard Biener
2020-07-31 12:37 ` Kewen.Lin
2020-07-31 13:01 ` Richard Biener
2020-07-31 13:21 ` Kewen.Lin
2020-07-31 14:51 ` Kewen.Lin [this message]
2020-08-05 7:27 ` [PATCH v5] " Richard Sandiford
2020-08-05 14:06 ` Segher Boessenkool
2020-08-06 6:47 ` Kewen.Lin
2020-07-22 17:49 ` [PATCH v2] " Segher Boessenkool
2020-07-27 3:44 ` Kewen.Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee92269f-2cf6-546f-ff65-53f5970e5d82@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).