From: Richard Biener <rguenther@suse.de>
To: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Re: [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops)
Date: Tue, 29 Oct 2019 13:45:00 -0000 [thread overview]
Message-ID: <nycvar.YFH.7.76.1910291431230.5566@zhemvz.fhfr.qr> (raw)
In-Reply-To: <2f8c538b-8dec-8e85-9a52-9d139e73bdec@arm.com>
[-- Attachment #1: Type: text/plain, Size: 15126 bytes --]
On Tue, 29 Oct 2019, Andre Vieira (lists) wrote:
> Hi richi,
>
>
> Committed the patch with the suggested change in revision r277569.
>
> I will look at the follow-up, from what I understood the idea was to add a
> field to the dr_vec_info where we keep the "extra-offset" that represents the
> position in the DR wrt the current loop/epilogue. So when we advance the DR,
> we change that field and not the DR_OFFSET? That way we always keep the
> "original state".
Yes. That would apply to the adjustment done by prologue peeling as
well. Still wonder how gather/scatter get by not having any adjustment ;)
Ah, I guess they simply are not affected by peeling since they get
their DR_OFFSET & friends computed in the loop (loaded from memory).
But DR_REF needs adjustment because it tries to walk def-stmts
(repeatedly in both analysis and transform phase).
Richard.
> Cheers,
> Andre
>
> gcc/ChangeLog:
> 2019-10-29 Andre Vieira <andre.simoesdiasvieira@arm.com>
>
> PR 88915
> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
> and make the valueize function pointer also take a void pointer.
> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
> around vn_valueize, to call it without a context.
> (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
> (~_loop_vec_info): Release epilogue_vinfos.
> (vect_analyze_loop_costing): Use knowledge of main VF to estimate
> number of iterations of epilogue.
> (vect_analyze_loop_2): Adapt to analyse main loop for all supported
> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest
> versioning threshold needed for main loop.
> (vect_analyze_loop): Likewise.
> (find_in_mapping): New helper function.
> (update_epilogue_loop_vinfo): New function.
> (vect_transform_loop): When vectorizing epilogues re-use analysis done
> on main loop and call update_epilogue_loop_vinfo to update it.
> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
> stmts on loop preheader edge.
> (vect_do_peeling): Enable skip-vectors when doing loop versioning if
> we decided to vectorize epilogues. Update epilogues NITERS and
> construct ADVANCE to update epilogues data references where needed.
> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
> (vect_do_peeling, vect_update_inits_of_drs,
> determine_peel_for_niter, vect_analyze_loop): Add or update declarations.
> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
> created loop_vec_info's for epilogues when available. Otherwise
> analyse
> epilogue separately.
>
>
>
> Cheers,
> Andre
>
> On 29/10/2019 11:48, Richard Biener wrote:
> > On Mon, 28 Oct 2019, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >> Reworked according to your comments, see inline for clarification.
> >>
> >> Is this OK for trunk?
> >
> > + gimple_seq seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo);
> > + while (seq)
> > + {
> > + stmt_worklist.safe_push (seq);
> > + seq = seq->next;
> > + }
> >
> > you're supposed to do to the following, not access the ->next
> > implementation detail:
> >
> > for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
> > gsi_next (&gsi))
> > stmt_worklist.safe_push (gsi_stmt (gsi));
> >
> >
> > + /* Data references for gather loads and scatter stores do not use
> > the
> > + updated offset we set using ADVANCE. Instead we have to make
> > sure
> > the
> > + reference in the data references point to the corresponding copy
> > of
> > + the original in the epilogue. */
> > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> > + {
> > + DR_REF (dr)
> > + = simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE,
> > + &find_in_mapping, &mapping);
> > + DR_BASE_ADDRESS (dr)
> > + = simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE,
> > NULL_TREE,
> > + &find_in_mapping, &mapping);
> > + }
> >
> > Hmm. So for other DRs we account for the previous vector loop
> > by adjusting DR_OFFSET? But STMT_VINFO_GATHER_SCATTER_P ends up
> > using (unconditionally) DR_REF here? In that case it seems
> > best to adjust DR_REF only but NULL out DR_BASE_ADDRESS and
> > DR_OFFSET? I wonder how prologue peeling deals with
> > STMT_VINFO_GATHER_SCATTER_P ... I see the caller of
> > vect_update_init_of_dr there does nothing for STMT_VINFO_GATHER_SCATTER_P.
> >
> > I wonder if (as followup to not delay this further) we can
> > "offload" all the DR adjustment by storing ADVANCE in dr_vec_info
> > and accounting for that when we create the dataref pointers in
> > vectorizable_load/store? That way we could avoid saving/restoring
> > DR_OFFSET as well.
> >
> > So, the patch is OK with the sequence iteration fixed. I think
> > sorting out the above can be done as followup.
> >
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >> 2019-10-28 Andre Vieira <andre.simoesdiasvieira@arm.com>
> >>
> >> PR 88915
> >> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
> >> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
> >> and make the valueize function pointer also take a void pointer.
> >> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
> >> around vn_valueize, to call it without a context.
> >> (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
> >> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
> >> (~_loop_vec_info): Release epilogue_vinfos.
> >> (vect_analyze_loop_costing): Use knowledge of main VF to estimate
> >> number of iterations of epilogue.
> >> (vect_analyze_loop_2): Adapt to analyse main loop for all supported
> >> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest
> >> versioning threshold needed for main loop.
> >> (vect_analyze_loop): Likewise.
> >> (find_in_mapping): New helper function.
> >> (update_epilogue_loop_vinfo): New function.
> >> (vect_transform_loop): When vectorizing epilogues re-use analysis done
> >> on main loop and call update_epilogue_loop_vinfo to update it.
> >> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
> >> stmts on loop preheader edge.
> >> (vect_do_peeling): Enable skip-vectors when doing loop versioning if
> >> we decided to vectorize epilogues. Update epilogues NITERS and
> >> construct ADVANCE to update epilogues data references where needed.
> >> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
> >> (vect_do_peeling, vect_update_inits_of_drs,
> >> determine_peel_for_niter, vect_analyze_loop): Add or update
> >> declarations.
> >> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
> >> created loop_vec_info's for epilogues when available. Otherwise
> >> analyse
> >> epilogue separately.
> >>
> >>
> >>
> >> Cheers,
> >> Andre
> >>
> >> On 28/10/2019 14:16, Richard Biener wrote:
> >>> On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> This is the reworked patch after your comments.
> >>>>
> >>>> I have moved the epilogue check into the analysis form disguised under
> >>>> '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing
> >>>> the
> >>>> "lowest threshold" check there.
> >>>>
> >>>> The only place where we may reject an epilogue_vinfo is when we know the
> >>>> number of scalar iterations and we realize the number of iterations left
> >>>> after
> >>>> the main loop are not enough to enter the vectorized epilogue so we
> >>>> optimize
> >>>> away that code-gen. The only way we know this to be true is if the
> >>>> number
> >>>> of
> >>>> scalar iterations are known and the peeling for alignment is known. So we
> >>>> know
> >>>> we will enter the main loop regardless, so whether the threshold we use
> >>>> is
> >>>> for
> >>>> a lower VF or not it shouldn't matter as much, I would even like to think
> >>>> that
> >>>> check isn't done, but I am not sure... Might be worth checking as an
> >>>> optimization.
> >>>>
> >>>>
> >>>> Is this OK for trunk?
> >>>
> >>> + for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
> >>> + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
> >>> + {
> >>> ..
> >>> + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
> >>> + pattern_worklist.safe_push (stmt_vinfo);
> >>> +
> >>> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> >>> + while (related_vinfo && related_vinfo != stmt_vinfo)
> >>> + {
> >>>
> >>> I think PHIs cannot have patterns. You can assert
> >>> that STMT_VINFO_RELATED_STMT is NULL I think.
> >>
> >> Done.
> >>>
> >>> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> >>> + while (related_vinfo && related_vinfo != stmt_vinfo)
> >>> + {
> >>> + related_worklist.safe_push (related_vinfo);
> >>> + /* Set BB such that the assert in
> >>> + 'get_initial_def_for_reduction' is able to determine that
> >>> + the BB of the related stmt is inside this loop. */
> >>> + gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
> >>> + gimple_bb (new_stmt));
> >>> + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
> >>> + }
> >>>
> >>> do we really keep references to "nested" patterns? Thus, do you
> >>> need this loop?
> >>
> >> Changed and added asserts. They didn't trigger so I suppose you are right,
> >> I
> >> didn't know at the time whether it was possible, so I just operated on the
> >> side of caution. Can remove the asserts and so on if you want.
> >>>
> >>> + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the
> >>> + original main loop and thus need to be updated to refer to the
> >>> cloned
> >>> + variables used in the epilogue. */
> >>> + for (unsigned i = 0; i < pattern_worklist.length (); ++i)
> >>> + {
> >>> ...
> >>> + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
> >>> + &find_in_mapping, &mapping);
> >>> + gimple_set_op (seq, j, op);
> >>>
> >>> you do this for the pattern-def seq but not for the related one.
> >>> I guess you ran into this for COND_EXPR conditions. I wondered
> >>> to use a shared worklist for both the def-seq and the main pattern
> >>> stmt or at least to split out the replacement so you can share it.
> >>
> >> I think that was it yeah, reworked it now to use the same list. Less code,
> >> thanks!
> >>>
> >>> + /* Data references for gather loads and scatter stores do not use
> >>> the
> >>> + updated offset we set using ADVANCE. Instead we have to make
> >>> sure the
> >>> + reference in the data references point to the corresponding copy
> >>> of
> >>> + the original in the epilogue. */
> >>> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> >>> + {
> >>> + int j;
> >>> + if (TREE_CODE (DR_REF (dr)) == MEM_REF)
> >>> + j = 0;
> >>> + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF)
> >>> + j = 1;
> >>> + else
> >>> + gcc_unreachable ();
> >>> +
> >>> + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j)))
> >>> + {
> >>> + DR_REF (dr) = unshare_expr (DR_REF (dr));
> >>> + TREE_OPERAND (DR_REF (dr), j) = *new_op;
> >>> + }
> >>>
> >>> huh, do you really only ever see MEM_REF or ARRAY_REF here?
> >>> I would guess using simplify_replace_tree is safer.
> >>> There's also DR_BASE_ADDRESS - we seem to leave the DRs partially
> >>> updated, is that correct?
> >>
> >> Yeah can use simplify_replace_tree indeed. And I have changed it so it
> >> updates DR_BASE_ADDRESS. I think DR_BASE_ADDRESS never actually changed in
> >> the way we use data_references... Either way, replacing them if they do
> >> change
> >> is cleaner and more future proof.
> >>>
> >>> Otherwise looks OK to me.
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>
> >>>> gcc/ChangeLog:
> >>>> 2019-10-25 Andre Vieira <andre.simoesdiasvieira@arm.com>
> >>>>
> >>>> PR 88915
> >>>> * tree-ssa-loop-niter.h (simplify_replace_tree): Change
> >>>> declaration.
> >>>> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context
> >>>> parameter
> >>>> and make the valueize function pointer also take a void pointer.
> >>>> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
> >>>> around vn_valueize, to call it without a context.
> >>>> (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
> >>>> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
> >>>> (~_loop_vec_info): Release epilogue_vinfos.
> >>>> (vect_analyze_loop_costing): Use knowledge of main VF to estimate
> >>>> number of iterations of epilogue.
> >>>> (vect_analyze_loop_2): Adapt to analyse main loop for all supported
> >>>> vector sizes when vect-epilogues-nomask=1. Also keep track of
> >>>> lowest
> >>>> versioning threshold needed for main loop.
> >>>> (vect_analyze_loop): Likewise.
> >>>> (find_in_mapping): New helper function.
> >>>> (update_epilogue_loop_vinfo): New function.
> >>>> (vect_transform_loop): When vectorizing epilogues re-use analysis
> >>>> done
> >>>> on main loop and call update_epilogue_loop_vinfo to update it.
> >>>> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer
> >>>> insert
> >>>> stmts on loop preheader edge.
> >>>> (vect_do_peeling): Enable skip-vectors when doing loop versioning
> >>>> if
> >>>> we decided to vectorize epilogues. Update epilogues NITERS and
> >>>> construct ADVANCE to update epilogues data references where needed.
> >>>> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
> >>>> (vect_do_peeling, vect_update_inits_of_drs,
> >>>> determine_peel_for_niter, vect_analyze_loop): Add or update
> >>>> declarations.
> >>>> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use
> >>>> already
> >>>> created loop_vec_info's for epilogues when available. Otherwise
> >>>> analyse
> >>>> epilogue separately.
> >>>>
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Andre
> >>>>
> >>>
> >>
> >>
> >
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2019-10-29 13:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-23 17:17 [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Andre Vieira (lists)
2019-08-26 13:17 ` Richard Biener
2019-09-18 11:11 ` Andre Vieira (lists)
2019-10-08 13:16 ` Andre Vieira (lists)
2019-10-09 9:06 ` Richard Biener
2019-10-10 14:01 ` Andre Vieira (lists)
2019-10-11 13:01 ` Richard Biener
2019-10-22 12:53 ` Andre Vieira (lists)
2019-10-22 14:05 ` Richard Biener
2019-10-25 16:18 ` Andre Vieira (lists)
2019-10-28 13:16 ` Richard Biener
2019-10-22 18:07 ` Richard Sandiford
2019-10-25 16:20 ` Andre Vieira (lists)
2019-10-25 16:27 ` Andre Vieira (lists)
2019-10-28 14:18 ` Richard Biener
2019-10-28 18:37 ` Andre Vieira (lists)
2019-10-29 12:12 ` Richard Biener
2019-10-29 13:17 ` [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops) Andre Vieira (lists)
2019-10-29 13:45 ` Richard Biener [this message]
2019-09-04 15:35 ` [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.76.1910291431230.5566@zhemvz.fhfr.qr \
--to=rguenther@suse.de \
--cc=andre.simoesdiasvieira@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).