public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Re: [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops)
Date: Tue, 29 Oct 2019 13:45:00 -0000	[thread overview]
Message-ID: <nycvar.YFH.7.76.1910291431230.5566@zhemvz.fhfr.qr> (raw)
In-Reply-To: <2f8c538b-8dec-8e85-9a52-9d139e73bdec@arm.com>

[-- Attachment #1: Type: text/plain, Size: 15126 bytes --]

On Tue, 29 Oct 2019, Andre Vieira (lists) wrote:

> Hi richi,
> 
> 
> Committed the patch with the suggested change in revision r277569.
> 
> I will look at the follow-up, from what I understood the idea was to add a
> field to the dr_vec_info where we keep the "extra-offset" that represents the
> position in the DR wrt the current loop/epilogue. So when we advance the DR,
> we change that field and not the DR_OFFSET? That way we always keep the
> "original state".

Yes.  That would apply to the adjustment done by prologue peeling as
well.  Still wonder how gather/scatter get by not having any adjustment ;)
Ah, I guess they simply are not affected by peeling since they get
their DR_OFFSET & friends computed in the loop (loaded from memory).
But DR_REF needs adjustment because it tries to walk def-stmts
(repeatedly in both analysis and transform phase).

Richard.

> Cheers,
> Andre
> 
> gcc/ChangeLog:
> 2019-10-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>     PR 88915
>     * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
>     * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
>     and make the valueize function pointer also take a void pointer.
>     * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
>     around vn_valueize, to call it without a context.
>     (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
>     * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
>     (~_loop_vec_info): Release epilogue_vinfos.
>     (vect_analyze_loop_costing): Use knowledge of main VF to estimate
>     number of iterations of epilogue.
>     (vect_analyze_loop_2): Adapt to analyse main loop for all supported
>     vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
>     versioning threshold needed for main loop.
>     (vect_analyze_loop): Likewise.
>     (find_in_mapping): New helper function.
>     (update_epilogue_loop_vinfo): New function.
>     (vect_transform_loop): When vectorizing epilogues re-use analysis done
>     on main loop and call update_epilogue_loop_vinfo to update it.
>     * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
>     stmts on loop preheader edge.
>     (vect_do_peeling): Enable skip-vectors when doing loop versioning if
>     we decided to vectorize epilogues.  Update epilogues NITERS and
>     construct ADVANCE to update epilogues data references where needed.
>     * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
>     (vect_do_peeling, vect_update_inits_of_drs,
>      determine_peel_for_niter, vect_analyze_loop): Add or update declarations.
>     * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
>     created loop_vec_info's for epilogues when available.  Otherwise 
> analyse
>     epilogue separately.
> 
> 
> 
> Cheers,
> Andre
> 
> On 29/10/2019 11:48, Richard Biener wrote:
> > On Mon, 28 Oct 2019, Andre Vieira (lists) wrote:
> > 
> >> Hi,
> >>
> >> Reworked according to your comments, see inline for clarification.
> >>
> >> Is this OK for trunk?
> > 
> > +             gimple_seq seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo);
> > +             while (seq)
> > +               {
> > +                 stmt_worklist.safe_push (seq);
> > +                 seq = seq->next;
> > +               }
> > 
> > you're supposed to do to the following, not access the ->next
> > implementation detail:
> > 
> >      for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
> > gsi_next (&gsi))
> >        stmt_worklist.safe_push (gsi_stmt (gsi));
> > 
> > 
> > +      /* Data references for gather loads and scatter stores do not use
> > the
> > +        updated offset we set using ADVANCE.  Instead we have to make
> > sure
> > the
> > +        reference in the data references point to the corresponding copy
> > of
> > +        the original in the epilogue.  */
> > +      if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> > +       {
> > +         DR_REF (dr)
> > +           = simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE,
> > +                                    &find_in_mapping, &mapping);
> > +         DR_BASE_ADDRESS (dr)
> > +           = simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE,
> > NULL_TREE,
> > +                                    &find_in_mapping, &mapping);
> > +       }
> > 
> > Hmm.  So for other DRs we account for the previous vector loop
> > by adjusting DR_OFFSET?  But STMT_VINFO_GATHER_SCATTER_P ends up
> > using (unconditionally) DR_REF here?  In that case it seems
> > best to adjust DR_REF only but NULL out DR_BASE_ADDRESS and
> > DR_OFFSET?  I wonder how prologue peeling deals with
> > STMT_VINFO_GATHER_SCATTER_P ... I see the caller of
> > vect_update_init_of_dr there does nothing for STMT_VINFO_GATHER_SCATTER_P.
> > 
> > I wonder if (as followup to not delay this further) we can
> > "offload" all the DR adjustment by storing ADVANCE in dr_vec_info
> > and accounting for that when we create the dataref pointers in
> > vectorizable_load/store?  That way we could avoid saving/restoring
> > DR_OFFSET as well.
> > 
> > So, the patch is OK with the sequence iteration fixed.  I think
> > sorting out the above can be done as followup.
> > 
> > Thanks,
> > Richard.
> > 
> >> gcc/ChangeLog:
> >> 2019-10-28  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> >>
> >>      PR 88915
> >>      * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
> >>      * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
> >>      and make the valueize function pointer also take a void pointer.
> >>      * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
> >>      around vn_valueize, to call it without a context.
> >>      (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
> >>      * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
> >>      (~_loop_vec_info): Release epilogue_vinfos.
> >>      (vect_analyze_loop_costing): Use knowledge of main VF to estimate
> >>      number of iterations of epilogue.
> >>      (vect_analyze_loop_2): Adapt to analyse main loop for all supported
> >>      vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
> >>      versioning threshold needed for main loop.
> >>      (vect_analyze_loop): Likewise.
> >>      (find_in_mapping): New helper function.
> >>      (update_epilogue_loop_vinfo): New function.
> >>      (vect_transform_loop): When vectorizing epilogues re-use analysis done
> >>      on main loop and call update_epilogue_loop_vinfo to update it.
> >>      * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
> >>      stmts on loop preheader edge.
> >>      (vect_do_peeling): Enable skip-vectors when doing loop versioning if
> >>      we decided to vectorize epilogues.  Update epilogues NITERS and
> >>      construct ADVANCE to update epilogues data references where needed.
> >>      * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
> >>      (vect_do_peeling, vect_update_inits_of_drs,
> >>      determine_peel_for_niter, vect_analyze_loop): Add or update
> >>      declarations.
> >>      * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
> >>      created loop_vec_info's for epilogues when available.  Otherwise
> >> analyse
> >>      epilogue separately.
> >>
> >>
> >>
> >> Cheers,
> >> Andre
> >>
> >> On 28/10/2019 14:16, Richard Biener wrote:
> >>> On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> This is the reworked patch after your comments.
> >>>>
> >>>> I have moved the epilogue check into the analysis form disguised under
> >>>> '!epilogue_vinfos.is_empty ()'.  This because I realized that I am doing
> >>>> the
> >>>> "lowest threshold" check there.
> >>>>
> >>>> The only place where we may reject an epilogue_vinfo is when we know the
> >>>> number of scalar iterations and we realize the number of iterations left
> >>>> after
> >>>> the main loop are not enough to enter the vectorized epilogue so we
> >>>> optimize
> >>>> away that code-gen.  The only way we know this to be true is if the
> >>>> number
> >>>> of
> >>>> scalar iterations are known and the peeling for alignment is known. So we
> >>>> know
> >>>> we will enter the main loop regardless, so whether the threshold we use
> >>>> is
> >>>> for
> >>>> a lower VF or not it shouldn't matter as much, I would even like to think
> >>>> that
> >>>> check isn't done, but I am not sure... Might be worth checking as an
> >>>> optimization.
> >>>>
> >>>>
> >>>> Is this OK for trunk?
> >>>
> >>> +      for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
> >>> +          !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
> >>> +       {
> >>> ..
> >>> +         if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
> >>> +           pattern_worklist.safe_push (stmt_vinfo);
> >>> +
> >>> +         related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> >>> +         while (related_vinfo && related_vinfo != stmt_vinfo)
> >>> +           {
> >>>
> >>> I think PHIs cannot have patterns.  You can assert
> >>> that STMT_VINFO_RELATED_STMT is NULL I think.
> >>
> >> Done.
> >>>
> >>> +         related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> >>> +         while (related_vinfo && related_vinfo != stmt_vinfo)
> >>> +           {
> >>> +             related_worklist.safe_push (related_vinfo);
> >>> +             /* Set BB such that the assert in
> >>> +               'get_initial_def_for_reduction' is able to determine that
> >>> +               the BB of the related stmt is inside this loop.  */
> >>> +             gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
> >>> +                            gimple_bb (new_stmt));
> >>> +             related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
> >>> +           }
> >>>
> >>> do we really keep references to "nested" patterns?  Thus, do you
> >>> need this loop?
> >>
> >> Changed and added asserts.  They didn't trigger so I suppose you are right,
> >> I
> >> didn't know at the time whether it was possible, so I just operated on the
> >> side of caution.  Can remove the asserts and so on if you want.
> >>>
> >>> +  /* The PATTERN_DEF_SEQs in the epilogue were constructed using the
> >>> +     original main loop and thus need to be updated to refer to the
> >>> cloned
> >>> +     variables used in the epilogue.  */
> >>> +  for (unsigned i = 0; i < pattern_worklist.length (); ++i)
> >>> +    {
> >>> ...
> >>> +                 op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
> >>> +                                        &find_in_mapping, &mapping);
> >>> +                 gimple_set_op (seq, j, op);
> >>>
> >>> you do this for the pattern-def seq but not for the related one.
> >>> I guess you ran into this for COND_EXPR conditions.  I wondered
> >>> to use a shared worklist for both the def-seq and the main pattern
> >>> stmt or at least to split out the replacement so you can share it.
> >>
> >> I think that was it yeah, reworked it now to use the same list. Less code,
> >> thanks!
> >>>
> >>> +      /* Data references for gather loads and scatter stores do not use
> >>> the
> >>> +        updated offset we set using ADVANCE.  Instead we have to make
> >>> sure the
> >>> +        reference in the data references point to the corresponding copy
> >>> of
> >>> +        the original in the epilogue.  */
> >>> +      if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> >>> +       {
> >>> +         int j;
> >>> +         if (TREE_CODE (DR_REF (dr)) == MEM_REF)
> >>> +           j = 0;
> >>> +         else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF)
> >>> +           j = 1;
> >>> +         else
> >>> +           gcc_unreachable ();
> >>> +
> >>> +         if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j)))
> >>> +           {
> >>> +             DR_REF (dr) = unshare_expr (DR_REF (dr));
> >>> +             TREE_OPERAND (DR_REF (dr), j) = *new_op;
> >>> +           }
> >>>
> >>> huh, do you really only ever see MEM_REF or ARRAY_REF here?
> >>> I would guess using simplify_replace_tree is safer.
> >>> There's also DR_BASE_ADDRESS - we seem to leave the DRs partially
> >>> updated, is that correct?
> >>
> >> Yeah can use simplify_replace_tree indeed.  And I have changed it so it
> >> updates DR_BASE_ADDRESS.  I think DR_BASE_ADDRESS never actually changed in
> >> the way we use data_references... Either way, replacing them if they do
> >> change
> >> is cleaner and more future proof.
> >>>
> >>> Otherwise looks OK to me.
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>
> >>>> gcc/ChangeLog:
> >>>> 2019-10-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> >>>>
> >>>>       PR 88915
> >>>>       * tree-ssa-loop-niter.h (simplify_replace_tree): Change
> >>>>       declaration.
> >>>>       * tree-ssa-loop-niter.c (simplify_replace_tree): Add context
> >>>>       parameter
> >>>>       and make the valueize function pointer also take a void pointer.
> >>>>       * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
> >>>>       around vn_valueize, to call it without a context.
> >>>>       (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
> >>>>       * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
> >>>>       (~_loop_vec_info): Release epilogue_vinfos.
> >>>>       (vect_analyze_loop_costing): Use knowledge of main VF to estimate
> >>>>       number of iterations of epilogue.
> >>>>       (vect_analyze_loop_2): Adapt to analyse main loop for all supported
> >>>>       vector sizes when vect-epilogues-nomask=1.  Also keep track of
> >>>>       lowest
> >>>>       versioning threshold needed for main loop.
> >>>>       (vect_analyze_loop): Likewise.
> >>>>       (find_in_mapping): New helper function.
> >>>>       (update_epilogue_loop_vinfo): New function.
> >>>>       (vect_transform_loop): When vectorizing epilogues re-use analysis
> >>>>       done
> >>>>       on main loop and call update_epilogue_loop_vinfo to update it.
> >>>>       * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer
> >>>>       insert
> >>>>       stmts on loop preheader edge.
> >>>>       (vect_do_peeling): Enable skip-vectors when doing loop versioning
> >>>>       if
> >>>>       we decided to vectorize epilogues.  Update epilogues NITERS and
> >>>>       construct ADVANCE to update epilogues data references where needed.
> >>>>       * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
> >>>>       (vect_do_peeling, vect_update_inits_of_drs,
> >>>>       determine_peel_for_niter, vect_analyze_loop): Add or update
> >>>>       declarations.
> >>>>       * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use
> >>>>       already
> >>>>       created loop_vec_info's for epilogues when available.  Otherwise
> >>>> analyse
> >>>>       epilogue separately.
> >>>>
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Andre
> >>>>
> >>>
> >>
> >>
> > 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

  reply	other threads:[~2019-10-29 13:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 17:17 [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Andre Vieira (lists)
2019-08-26 13:17 ` Richard Biener
2019-09-18 11:11   ` Andre Vieira (lists)
2019-10-08 13:16   ` Andre Vieira (lists)
2019-10-09  9:06     ` Richard Biener
2019-10-10 14:01       ` Andre Vieira (lists)
2019-10-11 13:01         ` Richard Biener
2019-10-22 12:53           ` Andre Vieira (lists)
2019-10-22 14:05             ` Richard Biener
2019-10-25 16:18               ` Andre Vieira (lists)
2019-10-28 13:16                 ` Richard Biener
2019-10-22 18:07             ` Richard Sandiford
2019-10-25 16:20               ` Andre Vieira (lists)
2019-10-25 16:27                 ` Andre Vieira (lists)
2019-10-28 14:18                   ` Richard Biener
2019-10-28 18:37                     ` Andre Vieira (lists)
2019-10-29 12:12                       ` Richard Biener
2019-10-29 13:17                         ` [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops) Andre Vieira (lists)
2019-10-29 13:45                           ` Richard Biener [this message]
2019-09-04 15:35 ` [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.76.1910291431230.5566@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).