From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
Date: Tue, 12 Oct 2021 11:35:08 +0100 [thread overview]
Message-ID: <623fbfd9-b97c-8c6e-0348-07d6c4496592@arm.com> (raw)
In-Reply-To: <4272814n-8538-p793-157q-5n6q16r48n51@fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 8479 bytes --]
Hi Richi,
I think this is what you meant, I now hide all the unrolling cost
calculations in the existing target hooks for costs. I did need to
adjust 'finish_cost' to take the loop_vinfo so the target's
implementations are able to set the newly renamed 'suggested_unroll_factor'.
Also added the checks for the epilogue's VF.
Is this more like what you had in mind?
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_finish_cost): Add class
vec_info parameter.
* config/i386/i386.c (ix86_finish_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_finish_cost): Likewise.
* doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST.
* target.def: Add class vec_info parameter to finish_cost.
* targhooks.c (default_finish_cost): Likewise.
* targhooks.h (default_finish_cost): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Use
suggested_unroll_factor
to increase vectorization_factor if possible.
(_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor
member.
(vect_compute_single_scalar_iteration_cost): Adjust call to
finish_cost.
(vect_determine_partial_vectors_and_peeling): Ensure unrolled
loop is not predicated.
(vect_determine_unroll_factor): New.
(vect_try_unrolling): New.
(vect_reanalyze_as_main_loop): Also try to unroll when
reanalyzing as main loop.
(vect_analyze_loop): Add call to vect_try_unrolling and check
to ensure epilogue
is either a smaller VF than main loop or uses partial vectors
and might be of equal
VF.
(vect_estimate_min_profitable_iters): Adjust call to finish_cost.
(vectorizable_reduction): Make sure to not use
single_defuse_cyle when unrolling.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust
call to finish_cost.
* tree-vectorizer.h (finish_cost): Change to pass new class
vec_info parameter.
On 01/10/2021 09:19, Richard Biener wrote:
> On Thu, 30 Sep 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>>
>>>> That just forces trying the vector modes we've tried before. Though I might
>>>> need to revisit this now I think about it. I'm afraid it might be possible
>>>> for
>>>> this to generate an epilogue with a vf that is not lower than that of the
>>>> main
>>>> loop, but I'd need to think about this again.
>>>>
>>>> Either way I don't think this changes the vector modes used for the
>>>> epilogue.
>>>> But maybe I'm just missing your point here.
>>> Yes, I was refering to the above which suggests that when we vectorize
>>> the main loop with V4SF but unroll then we try vectorizing the
>>> epilogue with V4SF as well (but not unrolled). I think that's
>>> premature (not sure if you try V8SF if the main loop was V4SF but
>>> unrolled 4 times).
>> My main motivation for this was because I had a SVE loop that vectorized with
>> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll
>> V8HI by two and skipped using VNx8HI as a predicated epilogue which would've
>> been the best choice.
> I see, yes - for fully predicated epilogues it makes sense to consider
> the same vector mode as for the main loop anyways (independent on
> whether we're unrolling or not). One could argue that with an
> unrolled V4SImode main loop a predicated V8SImode epilogue would also
> be a good match (but then somehow costing favored the unrolled V4SI
> over the V8SI for the main loop...).
>
>> So that is why I decided to just 'reset' the vector_mode selection. In a
>> scenario where you only have the traditional vector modes it might make less
>> sense.
>>
>> Just realized I still didn't add any check to make sure the epilogue has a
>> lower VF than the previous loop, though I'm still not sure that could happen.
>> I'll go look at where to add that if you agree with this.
> As said above, it only needs a lower VF in case the epilogue is not
> fully masked - otherwise the same VF would be OK.
>
>>>> I can move it there, it would indeed remove the need for the change to
>>>> vect_update_vf_for_slp, the change to
>>>> vect_determine_partial_vectors_and_peeling would still be required I think.
>>>> It
>>>> is meant to disable using partial vectors in an unrolled loop.
>>> Why would we disable the use of partial vectors in an unrolled loop?
>> The motivation behind that is that the overhead caused by generating
>> predicates for each iteration will likely be too much for it to be profitable
>> to unroll. On top of that, when dealing with low iteration count loops, if
>> executing one predicated iteration would be enough we now still need to
>> execute all other unrolled predicated iterations, whereas if we keep them
>> unrolled we skip the unrolled loops.
> OK, I guess we're not factoring in costs when deciding on predication
> but go for it if it's gernally enabled and possible.
>
> With the proposed scheme we'd then cost the predicated not unrolled
> loop against a not predicated unrolled loop which might be a bit
> apples vs. oranges also because the target made the unroll decision
> based on the data it collected for the predicated loop.
>
>>> Sure but I'm suggesting you keep the not unrolled body as one way of
>>> costed vectorization but then if the target says "try unrolling"
>>> re-do the analysis with the same mode but a larger VF. Just like
>>> we iterate over vector modes you'll now iterate over pairs of
>>> vector mode + VF (unroll factor). It's not about re-using the costing
>>> it's about using costing that is actually relevant and also to avoid
>>> targets inventing two distinct separate costings - a target (powerpc)
>>> might already compute load/store density and other stuff for the main
>>> costing so it should have an idea whether doubling or triplicating is OK.
>>>
>>> Richard.
>> Sounds good! I changed the patch to determine the unrolling factor later,
>> after all analysis has been done and retry analysis if an unrolling factor
>> larger than 1 has been chosen for this loop and vector_mode.
>>
>> gcc/ChangeLog:
>>
>> * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR.
>> * doc/tm.texi.in: Add entries for TARGET_VECTORIZE_UNROLL_FACTOR.
>> * params.opt: Add vect-unroll and vect-unroll-reductions
>> parameters.
> What's the reason to add the --params? It looks like this makes
> us unroll with a static number short-cutting the target.
>
> IMHO that's never going to be a great thing - but what we could do
> is look at loop->unroll and try to honor that (factoring in that
> the vectorization factor is already the times we unroll).
>
> So I'd leave those params out for now, the user would have a much
> more fine-grained way to control this with the unroll pragma.
>
> Adding a max-vect-unroll parameter would be another thing but that
> would apply after the targets or pragma decision.
>
>> * target.def: Define hook TARGET_VECTORIZE_UNROLL_FACTOR.
> I still do not like the new target hook - as said I'd like to
> make you have the finis_cost hook allow the target to specify
> a suggested unroll factor instead because that's the point where
> it has all the info.
>
> Thanks,
> Richard.
>
>> * targhooks.c (default_unroll_factor): New.
>> * targhooks.h (default_unroll_factor): Likewise.
>> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>> par_unrolling_factor.
>> (vect_determine_partial_vectors_and_peeling): Account for
>> unrolling.
>> (vect_determine_unroll_factor): New.
>> (vect_try_unrolling): New.
>> (vect_reanalyze_as_main_loop): Call vect_try_unrolling when
>> retrying a loop_vinfo as a main loop.
>> (vect_analyze_loop): Call vect_try_unrolling when vectorizing
>> main loops.
>> (vect_analyze_loop): Allow for epilogue vectorization when unrolling
>> and rewalk vector_mode warray for the epilogues.
>> (vectorizable_reduction): Disable single_defuse_cycle when
>> unrolling.
>> * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor
>> as a member of loop_vec_info.
>>
[-- Attachment #2: vect_unroll3.patch --]
[-- Type: text/plain, Size: 21115 bytes --]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 36519ccc5a58abab483c38d0a6c5f039592bfc7f..e6ccb66ba41895c4583a959d03ac3f0f173adae6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15972,8 +15972,9 @@ aarch64_adjust_body_cost (aarch64_vector_costs *costs, unsigned int body_cost)
/* Implement TARGET_VECTORIZE_FINISH_COST. */
static void
-aarch64_finish_cost (void *data, unsigned *prologue_cost,
- unsigned *body_cost, unsigned *epilogue_cost)
+aarch64_finish_cost (class vec_info *vinfo ATTRIBUTE_UNUSED, void *data,
+ unsigned *prologue_cost, unsigned *body_cost,
+ unsigned *epilogue_cost)
{
auto *costs = static_cast<aarch64_vector_costs *> (data);
*prologue_cost = costs->region[vect_prologue];
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index afc2674d49da370ae0f5ef277df7e9954f303b8e..de7bb9fe62fcec53ee40a4798f24c6ccd4584736 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23048,8 +23048,9 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
/* Implement targetm.vectorize.finish_cost. */
static void
-ix86_finish_cost (void *data, unsigned *prologue_cost,
- unsigned *body_cost, unsigned *epilogue_cost)
+ix86_finish_cost (class vec_info *vinfo ATTRIBUTE_UNUSED, void *data,
+ unsigned *prologue_cost, unsigned *body_cost,
+ unsigned *epilogue_cost)
{
unsigned *cost = (unsigned *) data;
*prologue_cost = cost[vect_prologue];
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad81dfb316dff00cde810d6b1edd31fa49d5c1e8..6f674b6426284dbf9b9f8fdd85515cf9702adff6 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5551,8 +5551,9 @@ rs6000_adjust_vect_cost_per_loop (rs6000_cost_data *data)
/* Implement targetm.vectorize.finish_cost. */
static void
-rs6000_finish_cost (void *data, unsigned *prologue_cost,
- unsigned *body_cost, unsigned *epilogue_cost)
+rs6000_finish_cost (class vec_info *vinfo ATTRIBUTE_UNUSED, void *data,
+ unsigned *prologue_cost, unsigned *body_cost,
+ unsigned *epilogue_cost)
{
rs6000_cost_data *cost_data = (rs6000_cost_data*) data;
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index be8148583d8571b0d035b1938db9d056bfd213a8..05ddd4c58a3711dd949b28da3e61fb49d8175257 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6276,7 +6276,7 @@ return value should be viewed as a tentative cost that may later be
revised.
@end deftypefn
-@deftypefn {Target Hook} void TARGET_VECTORIZE_FINISH_COST (void *@var{data}, unsigned *@var{prologue_cost}, unsigned *@var{body_cost}, unsigned *@var{epilogue_cost})
+@deftypefn {Target Hook} void TARGET_VECTORIZE_FINISH_COST (class vec_info *@var{vinfo}, void *@var{data}, unsigned *@var{prologue_cost}, unsigned *@var{body_cost}, unsigned *@var{epilogue_cost})
This hook should complete calculations of the cost of vectorizing a loop
or basic block based on @var{data}, and return the prologue, body, and
epilogue costs as unsigned integers. The default returns the value of
diff --git a/gcc/target.def b/gcc/target.def
index bfa819609c21bd71c0cc585c01dba42534453f47..f0be0e10a9225dd75b013535d8e42c1d1bfe8f50 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2081,8 +2081,8 @@ or basic block based on @var{data}, and return the prologue, body, and\n\
epilogue costs as unsigned integers. The default returns the value of\n\
the three accumulators.",
void,
- (void *data, unsigned *prologue_cost, unsigned *body_cost,
- unsigned *epilogue_cost),
+ (class vec_info *vinfo, void *data, unsigned *prologue_cost,
+ unsigned *body_cost, unsigned *epilogue_cost),
default_finish_cost)
/* Function to delete target-specific cost modeling data. */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 92d51992e625c2497aa8496b1e2e3d916e5706fd..6fd1fade49cfe00295afd52aee7a34931bb48b92 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -123,7 +123,8 @@ extern unsigned default_add_stmt_cost (class vec_info *, void *, int,
enum vect_cost_for_stmt,
class _stmt_vec_info *, tree, int,
enum vect_cost_model_location);
-extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *);
+extern void default_finish_cost (class vec_info *, void *, unsigned *,
+ unsigned *, unsigned *);
extern void default_destroy_cost_data (void *);
/* OpenACC hooks. */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index c9b5208853dbc15706a65d1eb335e28e0564325e..0a3ecfa76406152ce79aaf19c5a2cc8b652936ff 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1518,8 +1518,9 @@ default_add_stmt_cost (class vec_info *vinfo, void *data, int count,
/* By default, the cost model just returns the accumulated costs. */
void
-default_finish_cost (void *data, unsigned *prologue_cost,
- unsigned *body_cost, unsigned *epilogue_cost)
+default_finish_cost (class vec_info *vinfo, void *data,
+ unsigned *prologue_cost, unsigned *body_cost,
+ unsigned *epilogue_cost)
{
unsigned *cost = (unsigned *) data;
*prologue_cost = cost[vect_prologue];
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 5a5b8da2e771a1dd204f22a6447eba96bb3b352c..50256cb6cb478246e3402162391096cbbc7fde94 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -365,6 +365,24 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
if (known_le (vectorization_factor, 1U))
return opt_result::failure_at (vect_location,
"not vectorized: unsupported data-type\n");
+ /* Apply unrolling factor, this was determined by
+ vect_determine_unroll_factor the first time we ran the analyzis for this
+ vector mode. */
+ if (loop_vinfo->suggested_unroll_factor > 1)
+ {
+ unsigned unrolling_factor = loop_vinfo->suggested_unroll_factor;
+ while (unrolling_factor > 1)
+ {
+ poly_uint64 candidate_factor = vectorization_factor * unrolling_factor;
+ if (estimated_poly_value (candidate_factor, POLY_VALUE_MAX)
+ <= (HOST_WIDE_INT) LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo))
+ {
+ vectorization_factor = candidate_factor;
+ break;
+ }
+ unrolling_factor /= 2;
+ }
+ }
LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
return opt_result::success ();
}
@@ -828,6 +846,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
skip_main_loop_edge (nullptr),
skip_this_loop_edge (nullptr),
reusable_accumulators (),
+ suggested_unroll_factor (1),
max_vectorization_factor (0),
mask_skip_niters (NULL_TREE),
rgroup_compare_type (NULL_TREE),
@@ -1301,7 +1320,7 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
si->kind, si->stmt_info, si->vectype,
si->misalign, si->where);
unsigned prologue_cost = 0, body_cost = 0, epilogue_cost = 0;
- finish_cost (target_cost_data, &prologue_cost, &body_cost,
+ finish_cost (NULL, target_cost_data, &prologue_cost, &body_cost,
&epilogue_cost);
destroy_cost_data (target_cost_data);
LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
@@ -2128,10 +2147,16 @@ vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo,
vectors to the epilogue, with the main loop continuing to operate
on full vectors.
+ If we are unrolling we also do not want to use partial vectors. This
+ is to avoid the overhead of generating multiple masks and also to
+ avoid having to execute entire iterations of FALSE masked instructions
+ when dealing with one or less full iterations.
+
??? We could then end up failing to use partial vectors if we
decide to peel iterations into a prologue, and if the main loop
then ends up processing fewer than VF iterations. */
- if (param_vect_partial_vector_usage == 1
+ if ((param_vect_partial_vector_usage == 1
+ || loop_vinfo->suggested_unroll_factor > 1)
&& !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
&& !vect_known_niters_smaller_than_vf (loop_vinfo))
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
@@ -2879,6 +2904,121 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
return true;
}
+/* Determine whether we should unroll this loop and ask target how much to
+ unroll by. */
+
+static opt_loop_vec_info
+vect_determine_unroll_factor (loop_vec_info loop_vinfo)
+{
+ stmt_vec_info stmt_info;
+ unsigned i;
+ bool seen_reduction_p = false;
+ poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+ FOR_EACH_VEC_ELT (loop_vinfo->stmt_vec_infos, i, stmt_info)
+ {
+ if (STMT_VINFO_IN_PATTERN_P (stmt_info)
+ || !STMT_VINFO_RELEVANT_P (stmt_info)
+ || stmt_info->vectype == NULL_TREE)
+ continue;
+ /* Do not unroll loops with negative steps as it is unlikely that
+ vectorization will succeed due to the way we deal with negative steps
+ in loads and stores in 'get_load_store_type'. */
+ if (stmt_info->dr_aux.dr
+ && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+ {
+ dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
+ tree step = vect_dr_behavior (loop_vinfo, dr_info)->step;
+ if (TREE_CODE (step) == INTEGER_CST
+ && tree_int_cst_compare (step, size_zero_node) < 0)
+ {
+ return opt_loop_vec_info::failure_at
+ (vect_location, "could not unroll due to negative step\n");
+ }
+ }
+
+ if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
+ {
+ auto red_info = info_for_reduction (loop_vinfo, stmt_info);
+ if (STMT_VINFO_REDUC_TYPE (red_info) == TREE_CODE_REDUCTION)
+ seen_reduction_p = true;
+ else
+ {
+ return opt_loop_vec_info::failure_at
+ (vect_location, "could not unroll loop with reduction due to "
+ "non TREE_CODE_REDUCTION\n");
+ }
+ }
+ }
+
+ if (known_le (vectorization_factor, 1U))
+ return opt_loop_vec_info::failure_at (vect_location,
+ "will not unroll loop with a VF of 1"
+ "or less\n");
+
+ opt_loop_vec_info unrolled_vinfo
+ = opt_loop_vec_info::success (vect_analyze_loop_form (loop_vinfo->loop,
+ loop_vinfo->shared));
+ unrolled_vinfo->vector_mode = loop_vinfo->vector_mode;
+ /* Use the suggested_unrolling_factor that was set during the target's
+ TARGET_VECTORIZE_FINISH_COST hook. */
+ unrolled_vinfo->suggested_unroll_factor = loop_vinfo->suggested_unroll_factor;
+ return unrolled_vinfo;
+}
+
+
+/* Try to unroll the current loop. First determine the unrolling factor using
+ the analysis done for the current vector mode. Then re-analyze the loop for
+ the given unrolling factor and the current vector mode. */
+
+static opt_loop_vec_info
+vect_try_unrolling (opt_loop_vec_info loop_vinfo, unsigned *n_stmts)
+{
+ DUMP_VECT_SCOPE ("vect_try_unrolling");
+
+ opt_loop_vec_info unrolled_vinfo = vect_determine_unroll_factor (loop_vinfo);
+ /* Reset unrolling factor, in case we decide to not unroll. */
+ loop_vinfo->suggested_unroll_factor = 1;
+ if (unrolled_vinfo)
+ {
+ if (unrolled_vinfo->suggested_unroll_factor > 1)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** unrolling factor %d chosen for vector mode %s,"
+ "re-trying analyzis...\n",
+ unrolled_vinfo->suggested_unroll_factor,
+ GET_MODE_NAME (unrolled_vinfo->vector_mode));
+ bool unrolling_fatal = false;
+ if (vect_analyze_loop_2 (unrolled_vinfo, unrolling_fatal, n_stmts)
+ && known_ne (loop_vinfo->vectorization_factor,
+ unrolled_vinfo->vectorization_factor))
+ {
+
+ loop_vinfo = unrolled_vinfo;
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "unrolling succeeded with factor = %d\n",
+ loop_vinfo->suggested_unroll_factor);
+
+ }
+ else
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "unrolling failed with factor = %d\n",
+ unrolled_vinfo->suggested_unroll_factor);
+ }
+ }
+ else
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "target determined unrolling is not profitable.\n");
+ }
+ loop_vinfo->loop->aux = NULL;
+ return loop_vinfo;
+}
+
/* If LOOP_VINFO is already a main loop, return it unmodified. Otherwise
try to reanalyze it as a main loop. Return the loop_vinfo on success
and null on failure. */
@@ -2904,6 +3044,8 @@ vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
bool fatal = false;
bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts);
loop->aux = NULL;
+ main_loop_vinfo = vect_try_unrolling (main_loop_vinfo, n_stmts);
+
if (!res)
{
if (dump_enabled_p ())
@@ -3038,6 +3180,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
if (res)
{
+ /* Only try unrolling main loops. */
+ if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+ loop_vinfo = vect_try_unrolling (loop_vinfo, &n_stmts);
+
LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
vectorized_loops++;
@@ -3056,13 +3202,26 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
/* Keep trying to roll back vectorization attempts while the
loop_vec_infos they produced were worse than this one. */
vec<loop_vec_info> &vinfos = first_loop_vinfo->epilogue_vinfos;
+ poly_uint64 vinfo_vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+ poly_uint64 first_vinfo_vf
+ = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
while (!vinfos.is_empty ()
+ && (known_lt (vinfo_vf, first_vinfo_vf)
+ || (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+ && maybe_eq (vinfo_vf, first_vinfo_vf)))
&& vect_joust_loop_vinfos (loop_vinfo, vinfos.last ()))
{
gcc_assert (vect_epilogues);
delete vinfos.pop ();
}
+ /* Check if we may want to replace the current first_loop_vinfo
+ with the new loop, but only if they have different vector
+ modes. If they have the same vector mode this means the main
+ loop is an unrolled loop and we are trying to vectorize the
+ epilogue using the same vector mode but with a lower
+ vectorization factor. */
if (vinfos.is_empty ()
+ && loop_vinfo->vector_mode != first_loop_vinfo->vector_mode
&& vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo))
{
loop_vec_info main_loop_vinfo
@@ -3105,14 +3264,34 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
/* For now only allow one epilogue loop. */
&& first_loop_vinfo->epilogue_vinfos.is_empty ())
{
- first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo);
- poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
- gcc_assert (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
- || maybe_ne (lowest_th, 0U));
- /* Keep track of the known smallest versioning
- threshold. */
- if (ordered_p (lowest_th, th))
- lowest_th = ordered_min (lowest_th, th);
+ /* Ensure the epilogue has a smaller VF than the main loop or
+ uses predication and has the same VF. */
+ if (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+ LOOP_VINFO_VECT_FACTOR (first_loop_vinfo))
+ || (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+ && maybe_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+ LOOP_VINFO_VECT_FACTOR (first_loop_vinfo))))
+ {
+ first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo);
+ poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
+ gcc_assert (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
+ || maybe_ne (lowest_th, 0U));
+ /* Keep track of the known smallest versioning
+ threshold. */
+ if (ordered_p (lowest_th, th))
+ lowest_th = ordered_min (lowest_th, th);
+ }
+ else
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Will not use %s mode as an"
+ " epilogue, since it leads to an higher"
+ " vectorization factor than main loop\n",
+ GET_MODE_NAME (loop_vinfo->vector_mode));
+ delete loop_vinfo;
+ loop_vinfo = opt_loop_vec_info::success (NULL);
+ }
}
else
{
@@ -3153,13 +3332,32 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
/* Handle the case that the original loop can use partial
vectorization, but want to only adopt it for the epilogue.
- The retry should be in the same mode as original. */
+ The retry should be in the same mode as original.
+ Also handle the case where we have unrolled the main loop and want to
+ retry all vector modes again for the epilogues, since the VF is now
+ at least twice as high as the current vector mode. */
if (vect_epilogues
&& loop_vinfo
- && LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo))
+ && (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo)
+ || loop_vinfo->suggested_unroll_factor > 1))
{
- gcc_assert (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+ gcc_assert ((LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+ || loop_vinfo->suggested_unroll_factor > 1)
&& !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo));
+ /* If we are unrolling, try all VECTOR_MODES for the epilogue. */
+ if (loop_vinfo->suggested_unroll_factor > 1)
+ {
+ next_vector_mode = vector_modes[0];
+ mode_i = 1;
+
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Re-trying analysis with vector mode"
+ " %s for epilogues after unrolling.\n",
+ GET_MODE_NAME (next_vector_mode));
+ continue;
+ }
+
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"***** Re-trying analysis with same vector mode"
@@ -4222,8 +4420,8 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
}
/* Complete the target-specific cost calculations. */
- finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), &vec_prologue_cost,
- &vec_inside_cost, &vec_epilogue_cost);
+ finish_cost (loop_vinfo, LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
+ &vec_prologue_cost, &vec_inside_cost, &vec_epilogue_cost);
vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
@@ -7212,7 +7410,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
participating. */
if (ncopies > 1
&& (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
- && reduc_chain_length == 1)
+ && reduc_chain_length == 1
+ && loop_vinfo->suggested_unroll_factor == 1)
single_defuse_cycle = true;
if (single_defuse_cycle || lane_reduc_code_p)
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 024a1c38a2342246d7891db1de5f1d6e6458d5dd..dce8b953d306b90185ffe75c637f1fdb998aa953 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5405,7 +5405,8 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
while (si < li_scalar_costs.length ()
&& li_scalar_costs[si].first == sl);
unsigned dummy;
- finish_cost (scalar_target_cost_data, &dummy, &scalar_cost, &dummy);
+ finish_cost (bb_vinfo, scalar_target_cost_data, &dummy, &scalar_cost,
+ &dummy);
destroy_cost_data (scalar_target_cost_data);
/* Complete the target-specific vector cost calculation. */
@@ -5418,7 +5419,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
}
while (vi < li_vector_costs.length ()
&& li_vector_costs[vi].first == vl);
- finish_cost (vect_target_cost_data, &vec_prologue_cost,
+ finish_cost (bb_vinfo, vect_target_cost_data, &vec_prologue_cost,
&vec_inside_cost, &vec_epilogue_cost);
destroy_cost_data (vect_target_cost_data);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index c4c5678e7f1abafc25c465319dbacf3ef50f0ae9..e91fb6691857cbcc0b1c087d6de35164a7c75e48 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -621,6 +621,13 @@ public:
about the reductions that generated them. */
hash_map<tree, vect_reusable_accumulator> reusable_accumulators;
+ /* The number of times that the target suggested we unroll the vector loop
+ in order to promote more ILP. This value will be used to re-analyze the
+ loop for vectorization and if successful the value will be folded into
+ vectorization_factor (and therefore exactly divides
+ vectorization_factor). */
+ unsigned int suggested_unroll_factor;
+
/* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR
if there is no particular limit. */
unsigned HOST_WIDE_INT max_vectorization_factor;
@@ -1570,10 +1577,10 @@ add_stmt_cost (vec_info *vinfo, void *data, stmt_info_for_cost *i)
/* Alias targetm.vectorize.finish_cost. */
static inline void
-finish_cost (void *data, unsigned *prologue_cost,
+finish_cost (class vec_info *vinfo, void *data, unsigned *prologue_cost,
unsigned *body_cost, unsigned *epilogue_cost)
{
- targetm.vectorize.finish_cost (data, prologue_cost, body_cost, epilogue_cost);
+ targetm.vectorize.finish_cost (vinfo, data, prologue_cost, body_cost, epilogue_cost);
}
/* Alias targetm.vectorize.destroy_cost_data. */
next prev parent reply other threads:[~2021-10-12 10:35 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-17 15:27 [PATCH 0/3][vect] Enable vector unrolling of main loop Andre Vieira (lists)
2021-09-17 15:31 ` [PATCH 1/3][vect] Add main vectorized loop unrolling Andre Vieira (lists)
2021-09-21 12:30 ` Richard Biener
2021-09-21 16:34 ` Andre Vieira (lists)
2021-09-22 6:14 ` Richard Biener
2021-09-30 8:52 ` [PATCH 1v2/3][vect] " Andre Vieira (lists)
2021-10-01 8:19 ` Richard Biener
2021-10-04 16:30 ` Richard Sandiford
2021-10-12 10:35 ` Andre Vieira (lists) [this message]
2021-10-15 8:48 ` Richard Biener
2021-10-20 13:29 ` Andre Vieira (lists)
2021-10-21 12:14 ` Richard Biener
2021-10-22 10:18 ` Richard Sandiford
2021-11-11 16:02 ` Andre Vieira (lists)
2021-11-12 13:12 ` Richard Biener
2021-11-22 11:41 ` Andre Vieira (lists)
2021-11-22 12:39 ` Richard Biener
2021-11-24 9:46 ` Andre Vieira (lists)
2021-11-24 11:00 ` Richard Biener
2021-11-25 10:40 ` Andre Vieira (lists)
2021-11-25 12:46 ` Richard Biener
2021-11-30 11:36 ` Andre Vieira (lists)
2021-11-30 13:56 ` Richard Biener
2021-12-07 11:27 ` [vect] Re-analyze all modes for epilogues Andre Vieira (lists)
2021-12-07 11:31 ` Andre Vieira (lists)
2021-12-07 11:48 ` Richard Biener
2021-12-07 13:31 ` Richard Sandiford
2021-12-07 13:33 ` Richard Biener
2021-12-07 11:45 ` Richard Biener
2021-12-07 15:17 ` Andre Vieira (lists)
2021-12-13 16:41 ` Andre Vieira (lists)
2021-12-14 11:39 ` Richard Sandiford
2021-12-17 16:33 ` Andre Vieira (lists)
2022-01-07 12:39 ` Richard Sandiford
2022-01-10 18:31 ` [PATCH 1v2/3][vect] Add main vectorized loop unrolling Andre Vieira (lists)
2022-01-11 7:14 ` Richard Biener
2021-10-22 10:12 ` Richard Sandiford
2021-09-17 15:32 ` [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops Andre Vieira (lists)
2021-10-14 13:44 ` Andre Vieira (lists)
2021-10-22 15:33 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=623fbfd9-b97c-8c6e-0348-07d6c4496592@arm.com \
--to=andre.simoesdiasvieira@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).