From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization
Date: Mon, 14 Jun 2021 09:10:29 +0100 [thread overview]
Message-ID: <4925fee5-dcea-0c14-388c-85c881ffd918@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2105201147390.9200@zhemvz.fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 4929 bytes --]
Hi,
On 20/05/2021 11:22, Richard Biener wrote:
> On Mon, 17 May 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>> So this is my second attempt at finding a way to improve how we generate the
>> vector IV's and teach the vectorizer to share them between main loop and
>> epilogues. On IRC we discussed my idea to use the loop's control_iv, but that
>> was a terrible idea and I quickly threw it in the bin. The main problem, that
>> for some reason I failed to see, was that the control_iv increases by 's' and
>> the datarefs by 's' * NELEMENTS where 's' is usually 1 and NELEMENTs the
>> amount of elements we handle per iteration. That means the epilogue loops
>> would have to start from the last loop's IV * the last loop's NELEMENT's and
>> that would just cause a mess.
>>
>> Instead I started to think about creating IV's for the datarefs and what I
>> thought worked best was to create these in scalar before peeling. That way the
>> peeling mechanisms takes care of the duplication of these for the vector and
>> scalar epilogues and it also takes care of adding phi-nodes for the
>> skip_vector paths.
> How does this work for if-converted loops where we use the
> non-if-converted scalar loop for (scalar) peeling but the
> if-converted scalar loop for vectorized epilogues? I suppose
> you're only adjusting the if-converted copy.
True hadn't thought about this :(
>
>> These new IV's have two functions:
>> 1) 'vect_create_data_ref_ptr' can use them to:
>> a) if it's the main loop: replace the values of the 'initial' value of the
>> main loop's IV and the initial values in the skip_vector phi-nodes
>> b) Update the the skip_vector phi-nodes argument for the non-skip path with
>> the updated vector ptr.
> b) means the prologue IV will not be dead there so we actually need
> to compute it? I suppose IVOPTs could be teached to replace an
> IV with its final value (based on some other IV) when it's unused?
> Or does it already magically do good?
It does not and ...
>
>> 2) They are used for the scalar epilogue ensuring they share the same
>> datareference ptr.
>>
>> There are still a variety of 'hacky' elements here and a lot of testing to be
>> done, but I hope to be able to clean them away. One of the main issues I had
>> was that I had to skip a couple of checks and things for the added phi-nodes
>> and update statements as these do not have stmt_vec_info representation.
>> Though I'm not sure adding this representation at their creation was much
>> cleaner... It is something I could play around with but I thought this was a
>> good moment to ask you for input. For instance, maybe we could do this
>> transformation before analysis?
>>
>> Also be aware that because I create a IV for each dataref this leads to
>> regressions with SVE codegen for instance. NEON is able to use the post-index
>> addressing mode to increase each dr IV at access time, but SVE can't do this.
>> For this I don't know if maybe we could try to be smart and create shared
>> IV's. So rather than make them based on the actual vector ptr, use a shared
>> sizetype IV that can be shared among dr IV's with the same step. Or maybe this
>> is something for IVOPTs?
> Certainly IVOPTs could decide to use the newly created IVs in the
> scalar loops for the DRs therein as well. But since IVOPTs only
> considers a single loop at a time it will probably not pay too
> much attention and is only influenced by the out-of-loop uses of
> the final values of the IVs.
>
> My gut feeling tells me that whatever we do we'll have to look
> into improving IVOPTs to consider multiple loops.
So I redid the IV-sharing and it's looking a lot simpler and neater,
however it only shares IVs between vectorized loops and not scalar pro-
or epilogues. I am not certain IVOPTs will be able to deal with these,
as it has no knowledge of the number of iterations of each different
loop. So take for instance a prologue peeling for alignment loop and a
first main vectorization loop. To be able to reuse the IV's from the
prologue in the main vectorization loop it would need to know that the
initial start adress + PEELING_NITERS == base address for main
vectorization loop.
I'll starting testing this approach for correctness if there are no
major concerns. Though I suspect we will only want to turn this into a
patch once we have the IVOPTs work done to a point where it at least
doesn't regress codegen because of shared IVs and eventually we can look
at how to solve the sharing between vectorized and scalar loops.
A small nitpick on my own RFC. I will probably move the 'skip_e' to
outside of the map, as we only need one per loop_vinfo and not one per
DR. Initially I didnt even have this skip_e in, but was using the
creation of a dummy PHI node and then replacing it with the real thing
later. Though this made the code simpler, especially when inserting the
'init's stmt_list.
Kind regards,
Andre
[-- Attachment #2: iv_rfc_3.patch --]
[-- Type: text/plain, Size: 8422 bytes --]
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index b317df532a9a92a619de9572378437d09c632ab0..e7d0f1e657b1a0c9bec75799817242e0bc1d8282 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4909,11 +4909,27 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
offset, byte_offset);
if (new_stmt_list)
{
- if (pe)
- {
- new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmt_list);
- gcc_assert (!new_bb);
- }
+ if (loop_vinfo)
+ {
+ iv_struct *iv = loop_vinfo->dr_ivs->get (dr);
+ if (iv && iv->skip_e)
+ {
+ /* The initial value of the IV we are inserting here will be used
+ in a PHI-node that will be used as the initial IV for the next
+ vectorized epilogue, so we have to make sure we insert this
+ NEW_STMT_LIST before the SKIP_E. */
+ gimple_stmt_iterator g = gsi_last_bb (iv->skip_e->src);
+ if (gimple_code (gsi_stmt (g)) == GIMPLE_COND)
+ gsi_insert_seq_before (&g, new_stmt_list, GSI_NEW_STMT);
+ else
+ gsi_insert_seq_after (&g, new_stmt_list, GSI_NEW_STMT);
+ }
+ else
+ {
+ new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmt_list);
+ gcc_assert (!new_bb);
+ }
+ }
else
gsi_insert_seq_before (gsi, new_stmt_list, GSI_SAME_STMT);
}
@@ -4946,12 +4962,55 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+ /* If this is an epilogue, make sure to use the previous loop's IV for
+ the same DR as the init of this DR's IV. */
+ if (loop_vinfo
+ && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+ {
+ iv_struct *iv
+ = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)->dr_ivs->get (dr);
+ gcc_assert (iv);
+ aggr_ptr_init = iv->def;
+ }
+
create_iv (aggr_ptr_init,
fold_convert (aggr_ptr_type, iv_step),
aggr_ptr, loop, &incr_gsi, insert_after,
&indx_before_incr, &indx_after_incr);
incr = gsi_stmt (incr_gsi);
+ if (loop_vinfo)
+ {
+ iv_struct &iv = loop_vinfo->dr_ivs->get_or_insert (dr);
+ tree incr_lhs = gimple_get_lhs (incr);
+ if (iv.skip_e)
+ {
+ edge e;
+ edge_iterator ei;
+ tree val;
+ /* Create a skip PHI, use INCR_LHS as the value for the not-skip
+ path(s) and AGGR_PTR_INIT as the value for the skip path. */
+ iv.def = copy_ssa_name (incr_lhs);
+ gphi *skip_phi = create_phi_node (iv.def,
+ iv.skip_e->dest);
+
+ FOR_EACH_EDGE (e, ei, skip_phi->bb->preds)
+ {
+ if (e == iv.skip_e)
+ val = aggr_ptr_init;
+ else
+ val = incr_lhs;
+ add_phi_arg (skip_phi, val, e, UNKNOWN_LOCATION);
+ }
+ }
+ else
+ {
+ /* No skip PHI has been created, so we can use INCR_LHS as the
+ IV.DEF. */
+ iv.def = incr_lhs;
+ }
+ }
+
/* Copy the points-to information if it exists. */
if (DR_PTR_INFO (dr))
{
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 012f48bd4870125c820049b4fc70db0ef0759bdf..2475ab8b8263499f3471b64aa10b956523fe56df 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2554,8 +2554,7 @@ class loop *
vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
tree *niters_vector, tree *step_vector,
tree *niters_vector_mult_vf_var, int th,
- bool check_profitability, bool niters_no_overflow,
- tree *advance)
+ bool check_profitability, bool niters_no_overflow)
{
edge e, guard_e;
tree type = TREE_TYPE (niters), guard_cond;
@@ -3059,10 +3058,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
niters = PHI_RESULT (new_phi);
}
- /* Set ADVANCE to the number of iterations performed by the previous
- loop and its prologue. */
- *advance = niters;
-
if (!vect_epilogues_updated_niters)
{
/* Subtract the number of iterations performed by the vectorized loop
@@ -3090,6 +3085,21 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
adjust_vec.release ();
free_original_copy_tables ();
+ if (vect_epilogues && skip_e)
+ {
+ /* Keep track of the SKIP_E to later construct a phi-node per
+ * datareference. */
+ unsigned int i;
+ vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+ struct data_reference *dr;
+
+ FOR_EACH_VEC_ELT (datarefs, i, dr)
+ {
+ iv_struct &iv = loop_vinfo->dr_ivs->get_or_insert (dr);
+ iv.skip_e = skip_e;
+ }
+ }
+
return vect_epilogues ? epilog : NULL;
}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ba36348b835c25bc556da71a133f81f8a6fc3745..fc2468ebc001935d86fee9b06855a02832b444ad 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -901,6 +901,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
}
epilogue_vinfos.create (6);
+ dr_ivs = new hash_map<data_reference *, iv_struct> ();
}
/* Free all levels of rgroup CONTROLS. */
@@ -927,6 +928,7 @@ _loop_vec_info::~_loop_vec_info ()
delete ivexpr_map;
delete scan_map;
epilogue_vinfos.release ();
+ delete dr_ivs;
/* When we release an epiloge vinfo that we do not intend to use
avoid clearing AUX of the main loop which should continue to
@@ -9252,7 +9254,7 @@ find_in_mapping (tree t, void *context)
prologue of the main loop. */
static void
-update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
+update_epilogue_loop_vinfo (class loop *epilogue)
{
loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue);
auto_vec<gimple *> stmt_worklist;
@@ -9267,11 +9269,6 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
free (LOOP_VINFO_BBS (epilogue_vinfo));
LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
- /* Advance data_reference's with the number of iterations of the previous
- loop and its prologue. */
- vect_update_inits_of_drs (epilogue_vinfo, advance, PLUS_EXPR);
-
-
/* The EPILOGUE loop is a copy of the original loop so they share the same
gimple UIDs. In this loop we update the loop_vec_info of the EPILOGUE to
point to the copied statements. We also create a mapping of all LHS' in
@@ -9489,8 +9486,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector,
&step_vector, &niters_vector_mult_vf, th,
- check_profitability, niters_no_overflow,
- &advance);
+ check_profitability, niters_no_overflow);
if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)
&& LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ())
@@ -9818,7 +9814,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
if (epilogue)
{
- update_epilogue_loop_vinfo (epilogue, advance);
+ update_epilogue_loop_vinfo (epilogue);
epilogue->simduid = loop->simduid;
epilogue->force_vectorize = loop->force_vectorize;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7dcb4cd0b46b03eef90705eed776d9c3dd797101..da7e2fe72a549634d3cb197e7136de995e20c2b0 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -545,6 +545,12 @@ typedef auto_vec<rgroup_controls> vec_loop_lens;
typedef auto_vec<std::pair<data_reference*, tree> > drs_init_vec;
+struct iv_struct
+{
+ tree def;
+ edge skip_e;
+};
+
/*-----------------------------------------------------------------*/
/* Info on vectorized loops. */
/*-----------------------------------------------------------------*/
@@ -756,6 +762,8 @@ public:
analysis. */
vec<_loop_vec_info *> epilogue_vinfos;
+ hash_map<data_reference *, iv_struct> *dr_ivs;
+
} *loop_vec_info;
/* Access Functions. */
@@ -1791,8 +1799,7 @@ class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
class loop *, edge);
class loop *vect_loop_versioning (loop_vec_info, gimple *);
extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
- tree *, tree *, tree *, int, bool, bool,
- tree *);
+ tree *, tree *, tree *, int, bool, bool);
extern void vect_prepare_for_masked_peels (loop_vec_info);
extern dump_user_location_t find_loop_location (class loop *);
extern bool vect_can_advance_ivs_p (loop_vec_info);
next prev parent reply other threads:[~2021-06-14 8:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-30 9:07 Andre Vieira (lists)
2021-05-04 9:56 ` Richard Biener
2021-05-05 11:34 ` Andre Vieira (lists)
2021-05-05 12:34 ` Richard Biener
2021-05-05 16:58 ` Andre Vieira (lists)
2021-05-07 12:05 ` Richard Biener
2021-05-17 8:14 ` Andre Vieira (lists)
2021-05-20 10:22 ` Richard Biener
2021-06-14 8:10 ` Andre Vieira (lists) [this message]
2021-06-14 8:37 ` Richard Biener
2021-06-14 10:57 ` Richard Biener
2021-06-16 10:24 ` Andre Vieira (lists)
2021-06-16 11:13 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4925fee5-dcea-0c14-388c-85c881ffd918@arm.com \
--to=andre.simoesdiasvieira@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).