public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops
Date: Fri, 11 Oct 2019 13:01:00 -0000	[thread overview]
Message-ID: <nycvar.YFH.7.76.1910111408390.5566@zhemvz.fhfr.qr> (raw)
In-Reply-To: <97dbaa36-107b-e078-05a5-0e8dc3700e9a@arm.com>

[-- Attachment #1: Type: text/plain, Size: 10923 bytes --]

On Thu, 10 Oct 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> After all the discussions and respins I now believe this patch is close to
> what we envisioned.
> 
> This patch achieves two things when vect-epilogues-nomask=1:
> 1) It analyzes the original loop for each supported vector size and saves this
> analysis per loop, as well as the vector sizes we know we can vectorize the
> loop for.
> 2) When loop versioning it uses the 'skip_vector' code path to vectorize the
> epilogue, and uses the lowest versioning threshold between the main and
> epilogue's.
> 
> As side effects of this patch I also changed ensure_base_align to only update
> the alignment if the new alignment is lower than the current one.  This
> function already did that if the object was a symbol, now it behaves this way
> for any object.
> 
> I bootstrapped this patch with both vect-epilogues-nomask turned on and off on
> x86_64 (AVX512) and aarch64.  Regression tests looked good.
> 
> Is this OK for trunk?

+
+  /* Keep track of vector sizes we know we can vectorize the epilogue 
with.  */
+  vector_sizes epilogue_vsizes;
 };

please don't enlarge struct loop, instead track this somewhere
in the vectorizer (in loop_vinfo?  I see you already have
epilogue_vinfos there - so the loop_vinfo simply lacks 
convenient access to the vector_size?)  I don't see any
use that could be trivially adjusted to look at a loop_vinfo
member instead.

For the vect_update_inits_of_drs this means that we'd possibly
do less CSE.  Not sure if really an issue.

You use LOOP_VINFO_EPILOGUE_P sometimes and sometimes
LOOP_VINFO_ORIG_LOOP_INFO, please change predicates to
LOOP_VINFO_EPILOGUE_P.

@@ -2466,15 +2461,62 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
niters, tree nitersm1,
   else
     niters_prolog = build_int_cst (type, 0);
     
+  loop_vec_info epilogue_vinfo = NULL;
+  if (vect_epilogues)
+    { 
...
+       vect_epilogues = false;
+    }
+

I don't understand what all this does - it clearly needs a comment.
Maybe the overall comment of the function should be amended with
an overview of how we handle [multiple] epilogue loop vectorization?

+
+      if (epilogue_any_upper_bound && prolog_peeling >= 0)
+       {
+         epilog->any_upper_bound = true;
+         epilog->nb_iterations_upper_bound = eiters + 1;
+       }
+

comment missing.  How can prolog_peeling be < 0?  We likely
didn't set the upper bound because we don't know it in the
case we skipped the vector loop (skip_vector)?  So make sure
to not introduce wrong-code issues here - maybe do this
optimization as followup?

 class loop *
-vect_loop_versioning (loop_vec_info loop_vinfo,
-                     unsigned int th, bool check_profitability,
-                     poly_uint64 versioning_threshold)
+vect_loop_versioning (loop_vec_info loop_vinfo)
 { 
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop;
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
@@ -2988,10 +3151,15 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT 
(loop_vinfo);
   bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo);
   bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
+  poly_uint64 versioning_threshold
+    = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
   tree version_simd_if_cond
     = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
+  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);

-  if (check_profitability)
+  if (th >= vect_vf_for_cost (loop_vinfo)
+      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      && !ordered_p (th, versioning_threshold))
     cond_expr = fold_build2 (GE_EXPR, boolean_type_node, 
scalar_loop_iters,
                             build_int_cst (TREE_TYPE (scalar_loop_iters),
                                            th - 1));

split out this refactoring - preapproved.

@@ -1726,7 +1729,13 @@ vect_analyze_loop_costing (loop_vec_info 
loop_vinfo)
       return 0;
     }

-  HOST_WIDE_INT estimated_niter = estimated_stmt_executions_int (loop);
+  HOST_WIDE_INT estimated_niter = -1;
+
+  if (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+    estimated_niter
+      = vect_vf_for_cost (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) - 1;
+  if (estimated_niter == -1)
+    estimated_niter = estimated_stmt_executions_int (loop);
   if (estimated_niter == -1)
     estimated_niter = likely_max_stmt_executions_int (loop);
   if (estimated_niter != -1

it's clearer if the old code is completely in a else {} path
even though vect_vf_for_cost - 1 should never be -1.

+/* Decides whether we need to create an epilogue loop to handle
+   remaining scalar iterations and sets PEELING_FOR_NITERS accordingly.  
*/
+      
+void                  
+determine_peel_for_niter (loop_vec_info loop_vinfo)
+{   
+  

extra vertical space

+  unsigned HOST_WIDE_INT const_vf;
+  HOST_WIDE_INT max_niter 

if it's a 1:1 copy outlined then split it out - preapproved
(so further reviews get smaller patches ;))  I'd add a
LOOP_VINFO_PEELING_FOR_NITER () = false as final else
since that's what we do by default?

-  if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
+  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
+      || ((orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+         && LOOP_REQUIRES_VERSIONING (orig_loop_vinfo)))

not sure why we need to do this for epilouges?

+
+      /*  Use the same condition as vect_transform_loop to decide when to 
use
+         the cost to determine a versioning threshold.  */
+      if (th >= vect_vf_for_cost (loop_vinfo)
+         && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+         && ordered_p (th, niters_th))
+       niters_th = ordered_max (poly_uint64 (th), niters_th);

that's an independent change, right?  Please split out, it's
pre-approved if it tests OK separately.

+static tree
+replace_ops (tree op, hash_map<tree, tree> &mapping)
+{

I'm quite sure I've seen such beast elsewhere ;)  simplify_replace_tree
comes up first (not a 1:1 match but hints at a possible tree
sharing issue in your variant).

@@ -8497,11 +8588,11 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   if (th >= vect_vf_for_cost (loop_vinfo)
       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
-      if (dump_enabled_p ())
-       dump_printf_loc (MSG_NOTE, vect_location,
-                        "Profitability threshold is %d loop 
iterations.\n",
-                         th);
-      check_profitability = true;
+       if (dump_enabled_p ())
+         dump_printf_loc (MSG_NOTE, vect_location,
+                          "Profitability threshold is %d loop 
iterations.\n",
+                          th);
+       check_profitability = true;
     }

   /* Make sure there exists a single-predecessor exit bb.  Do this before

obvious (separate)

+  tree advance;
   epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, 
&niters_vector,
                              &step_vector, &niters_vector_mult_vf, th,
-                             check_profitability, niters_no_overflow);
+                             check_profitability, niters_no_overflow,
+                             &advance);
+
+  if (epilogue)
+    {
+      basic_block *orig_bbs = get_loop_body (loop);
+      loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue);
...

please record this in vect_do_peeling itself and store the
orig_stmts/drs/etc. in the epilogue loop_vinfo and ...

+      /* We are done vectorizing the main loop, so now we update the 
epilogues
+        stmt_vec_info's.  At the same time we set the gimple UID of each
+        statement in the epilogue, as these are used to look them up in 
the
+        epilogues loop_vec_info later.  We also keep track of what
...

split this out to a new function.  I wonder why you need to record
the DRs, are they not available via ->datarefs and lookup_dr ()?

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
601a6f55fbff388c89f88d994e790aebf2bf960e..201549da6c0cbae0797a23ae1b8967b9895505e9 
100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6288,7 +6288,7 @@ ensure_base_align (dr_vec_info *dr_info)

       if (decl_in_symtab_p (base_decl))
        symtab_node::get (base_decl)->increase_alignment (align_base_to);
-      else
+      else if (DECL_ALIGN (base_decl) < align_base_to)
        {
          SET_DECL_ALIGN (base_decl, align_base_to);
           DECL_USER_ALIGN (base_decl) = 1;

split out - preapproved.

Still have to go over the main loop doing the analysis/transform.

Thanks, it looks really promising (albeit exepectedly ugly due to
the data rewriting).

Richard.


> gcc/ChangeLog:
> 2019-10-10  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>     PR 88915
>     * cfgloop.h (loop): Add epilogue_vsizes member.
>     * cfgloop.c (flow_loop_free): Release epilogue_vsizes.
>     (alloc_loop): Initialize epilogue_vsizes.
>     * gentype.c (main): Add poly_uint64 type and vector_sizes to
>     generator.
>     * tree-vect-loop.c (vect_get_loop_niters): Make externally visible.
>     (_loop_vec_info): Initialize epilogue_vinfos.
>     (~_loop_vec_info): Release epilogue_vinfos.
>     (vect_analyze_loop_costing): Use knowledge of main VF to estimate
>     number of iterations of epilogue.
>     (determine_peel_for_niter): New. Outlined code to re-use in two
>     places.
>     (vect_analyze_loop_2): Adapt to analyse main loop for all supported
>     vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
>     versioning threshold needed for main loop.
>     (vect_analyze_loop): Likewise.
>     (replace_ops): New helper function.
>     (vect_transform_loop): When vectorizing epilogues re-use analysis done
>     on main loop and update necessary information.
>     * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
>     stmts on loop preheader edge.
>     (vect_do_peeling): Enable skip-vectors when doing loop versioning if
>     we decided to vectorize epilogues.  Update epilogues NITERS and
>     construct ADVANCE to update epilogues data references where needed.
>     (vect_loop_versioning): Moved decision to check_profitability
>     based on cost model.
>     * tree-vect-stmts.c (ensure_base_align): Only update alignment
>     if new alignment is lower.
>     * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos member.
>     (vect_loop_versioning, vect_do_peeling, vect_get_loop_niters,
>     vect_update_inits_of_drs, determine_peel_for_niter,
>     vect_analyze_loop): Add or update declarations.
>     * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
>     create loop_vec_info's for epilogues when available.  Otherwise analyse
>     epilogue separately.
> 
> 
> 
> Cheers,
> Andre
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

  reply	other threads:[~2019-10-11 12:57 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 17:17 Andre Vieira (lists)
2019-08-26 13:17 ` Richard Biener
2019-09-18 11:11   ` Andre Vieira (lists)
2019-10-08 13:16   ` Andre Vieira (lists)
2019-10-09  9:06     ` Richard Biener
2019-10-10 14:01       ` Andre Vieira (lists)
2019-10-11 13:01         ` Richard Biener [this message]
2019-10-22 12:53           ` Andre Vieira (lists)
2019-10-22 14:05             ` Richard Biener
2019-10-25 16:18               ` Andre Vieira (lists)
2019-10-28 13:16                 ` Richard Biener
2019-10-22 18:07             ` Richard Sandiford
2019-10-25 16:20               ` Andre Vieira (lists)
2019-10-25 16:27                 ` Andre Vieira (lists)
2019-10-28 14:18                   ` Richard Biener
2019-10-28 18:37                     ` Andre Vieira (lists)
2019-10-29 12:12                       ` Richard Biener
2019-10-29 13:17                         ` [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops) Andre Vieira (lists)
2019-10-29 13:45                           ` Richard Biener
2019-09-04 15:35 ` [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.76.1910111408390.5566@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).