From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x542.google.com (mail-ed1-x542.google.com [IPv6:2a00:1450:4864:20::542]) by sourceware.org (Postfix) with ESMTPS id 6295D3857C4C for ; Wed, 23 Sep 2020 07:59:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6295D3857C4C Received: by mail-ed1-x542.google.com with SMTP id l17so18797358edq.12 for ; Wed, 23 Sep 2020 00:59:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=q6dS0FnBB42oV/p93QcBKyzRVP6JqruRylYPB8+bdfU=; b=ud3auNyEpFZcAEn7L73WeQYjzWvtPE6YDf04LzMeEpVkqWTY2bm0nVpDlMFGvMwbWL Kg0XfCyqlxoTbEveOXQjKSnMV9dCGdhuB/IepEJ/nMZOt5UJ1cTS7OWFmHv4qdv42nzG kSpOtihSRoGM8i3Vp7xnPYS6EAgdH4J5KAni/uN8EEYo9ovI+7cgif1zjK6ScPYrpy+Y 3EtQ0C+h/gcivDhylwwLTAVwnNmnqmS8+GEuay9C0gT1kmn4WHt6ly6xw0R5yHGazncp EwQnZkSavGQK3U1Uw1QnUrmWjPeRvgzrlZ0KU2n/ekhyhLfg8Rkct/rmbXALROdkoMAb yNzw== X-Gm-Message-State: AOAM532iPGvoLfdj6q66w3IXS06FmwMtwU5NAm1lfe4KxdtV5nGmGfd6 Sr9zC2mvXATy3ACv2qPbU8r7XG+ZWixFhWo0jUY= X-Google-Smtp-Source: ABdhPJyKeDKtjzNlnvIGjvdS48/HGXQ8ibq6rUbLwlJn7xElkqupuU4IyRyeJXK5/Gzk/KZJbvw9DNgQf8Ow6hWJ8fc= X-Received: by 2002:a50:84a2:: with SMTP id 31mr8511535edq.138.1600847950098; Wed, 23 Sep 2020 00:59:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 23 Sep 2020 09:58:59 +0200 Message-ID: Subject: Re: [PATCH] vect: Fix epilogue loop handling of partial vectors To: Andrea Corallo , "Kewen.Lin" , GCC Patches , Segher Boessenkool , Bill Schmidt , Richard Biener , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_MANYTO, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2020 07:59:14 -0000 On Tue, Sep 22, 2020 at 4:34 PM Richard Sandiford wrote: > > Richard Sandiford writes: > > I'll try to have a patch ready tomorrow morning European time. > > Well, I totally failed to hit that deadline. When testing on Power, > I saw a couple of extra failures, but I now think they're improvements > rather than regressions. See the point about single-iteration > epilogues below. > > --- > > This patch fixes the fallout that Kewen reported on Power after > the recent change to avoid unnecessary use of partial vectors. > As Kewen said, the problem is that vect_analyze_loop_2 doesn't > know how many epilogue iterations there will be, and so it > cannot make a final decision about whether the number of > iterations forces an epilogue loop to use partial vectors. > > This is similar to the current situation for peeling: we don't know > during initial analysis whether an epilogue loop will itself require > peeling. Instead we decide that during vect_do_peeling, where the > final number of epilogue loop iterations is known. > > The patch takes a similar approach for the decision about whether > to use partial vectors. As the comments in the patch say, the > idea is that vect_analyze_loop_2 should make peeling and partial- > vector decisions based on the assumption that the loop_vinfo will > be used as the main loop, while vect_do_peeling should make them > in the knowledge that the loop_vinfo will be used as an epilogue loop. > > This allows the same analysis to be used for both cases, which we > rely on for implementing VECT_COMPARE_COSTS; see the big comment > in vect_analyze_loop for details. > > I hope the patch makes the (mostly preexisting) structure a bit > more obvious. It isn't what anyone would design from scratch, > but that's the nature of working with a mature vector framework. Indeed :/ > Arranging things this way means that vect_verify_full_masking > and vect_verify_loop_lens now become part of the =E2=80=9Ccan=E2=80=9D ra= ther > than =E2=80=9Cwill=E2=80=9D test for partial vectors. > > Also, while splitting out the logic that handles epilogues with > constant iterations, I added a check to make sure that we don't > try to use partial vectors to vectorise a single-scalar loop. > This required some changes to the Power tests. > > Tested on aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and > powerpc64le-linux-gnu. OK to install? Thanks for the nice work. OK. Richard. > Richard > > > gcc/ > * tree-vectorizer.h (determine_peel_for_niter): Delete in favor o= f... > (vect_determine_partial_vectors_and_peeling): ...this new functio= n. > * tree-vect-loop-manip.c (vect_update_epilogue_niters): New funct= ion. > Reject using vector epilogue loops for single iterations. Instal= l > the constant number of epilogue loop iterations in the associated > loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling > to do the main part of the test. > (vect_do_peeling): Use vect_update_epilogue_niters to handle > epilogue loops with a known number of iterations. Skip recomputi= ng > the number of iterations later in that case. Otherwise, use > vect_determine_partial_vectors_and_peeling to decide whether the > epilogue loop needs to use partial vectors or peeling. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the > default can_use_partial_vectors_p to false if partial-vector-usag= e=3D0. > (determine_peel_for_niter): Remove in favor of... > (vect_determine_partial_vectors_and_peeling): ...this new functio= n, > split out from... > (vect_analyze_loop_2): ...here. Reflect the vect_verify_full_mas= king > and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P > rather than USING_PARTIAL_VECTORS_P. > > gcc/testsuite/ > * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the > single-iteration epilogues of the 64-bit loops to be vectorized. > * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. > * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise. > --- > .../gcc.target/powerpc/p9-vec-length-epil-1.c | 4 +- > .../gcc.target/powerpc/p9-vec-length-epil-7.c | 2 +- > .../gcc.target/powerpc/p9-vec-length-epil-8.c | 4 +- > gcc/tree-vect-loop-manip.c | 83 +++++--- > gcc/tree-vect-loop.c | 196 ++++++++++++------ > gcc/tree-vectorizer.h | 3 +- > 6 files changed, 192 insertions(+), 100 deletions(-) > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 9dffc5570e5..b7fa6bc8d2f 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -1967,7 +1967,8 @@ extern tree vect_create_addr_base_for_vector_ref (v= ec_info *, > extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_= vinfo); > bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *); > /* Used in tree-vect-loop-manip.c */ > -extern void determine_peel_for_niter (loop_vec_info); > +extern opt_result vect_determine_partial_vectors_and_peeling (loop_vec_i= nfo, > + bool); > /* Used in gimple-loop-interchange.c and tree-parloops.c. */ > extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, = tree, > enum tree_code); > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 47cfa6f4061..7cf00e6eed4 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -2386,6 +2386,34 @@ slpeel_update_phi_nodes_for_lcssa (class loop *epi= log) > rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); > } > > +/* EPILOGUE_VINFO is an epilogue loop that we now know would need to > + iterate exactly CONST_NITERS times. Make a final decision about > + whether the epilogue loop should be used, returning true if so. */ > + > +static bool > +vect_update_epilogue_niters (loop_vec_info epilogue_vinfo, > + unsigned HOST_WIDE_INT const_niters) > +{ > + /* Avoid wrap-around when computing const_niters - 1. Also reject > + using an epilogue loop for a single scalar iteration, even if > + we could in principle implement that using partial vectors. */ > + unsigned int gap_niters =3D LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinf= o); > + if (const_niters <=3D gap_niters + 1) > + return false; > + > + /* Install the number of iterations. */ > + tree niters_type =3D TREE_TYPE (LOOP_VINFO_NITERS (epilogue_vinfo)); > + tree niters_tree =3D build_int_cst (niters_type, const_niters); > + tree nitersm1_tree =3D build_int_cst (niters_type, const_niters - 1); > + > + LOOP_VINFO_NITERS (epilogue_vinfo) =3D niters_tree; > + LOOP_VINFO_NITERSM1 (epilogue_vinfo) =3D nitersm1_tree; > + > + /* Decide what to do if the number of epilogue iterations is not > + a multiple of the epilogue loop's vectorization factor. */ > + return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, tru= e); > +} > + > /* Function vect_do_peeling. > > Input: > @@ -2493,6 +2521,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree nit= ers, tree nitersm1, > int estimated_vf; > int prolog_peeling =3D 0; > bool vect_epilogues =3D loop_vinfo->epilogue_vinfos.length () > 0; > + bool vect_epilogues_updated_niters =3D false; > /* We currently do not support prolog peeling if the target alignment = is not > known at compile time. 'vect_gen_prolog_loop_niters' depends on th= e > target alignment being constant. */ > @@ -2601,8 +2630,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree nit= ers, tree nitersm1, > if (vect_epilogues > && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > && prolog_peeling >=3D 0 > - && known_eq (vf, lowest_vf) > - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo)) > + && known_eq (vf, lowest_vf)) > { > unsigned HOST_WIDE_INT eiters > =3D (LOOP_VINFO_INT_NITERS (loop_vinfo) > @@ -2612,13 +2640,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree ni= ters, tree nitersm1, > eiters > =3D eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)= ; > > - unsigned int ratio; > - unsigned int epilogue_gaps > - =3D LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); > - while (!(constant_multiple_p > - (GET_MODE_SIZE (loop_vinfo->vector_mode), > - GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio) > - && eiters >=3D lowest_vf / ratio + epilogue_gaps)) > + while (!vect_update_epilogue_niters (epilogue_vinfo, eiters)) > { > delete epilogue_vinfo; > epilogue_vinfo =3D NULL; > @@ -2629,8 +2651,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree nit= ers, tree nitersm1, > } > epilogue_vinfo =3D loop_vinfo->epilogue_vinfos[0]; > loop_vinfo->epilogue_vinfos.ordered_remove (0); > - epilogue_gaps =3D LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); > } > + vect_epilogues_updated_niters =3D true; > } > /* Prolog loop may be skipped. */ > bool skip_prolog =3D (prolog_peeling !=3D 0); > @@ -2928,7 +2950,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree nit= ers, tree nitersm1, > skip_e edge. */ > if (skip_vector) > { > - gcc_assert (update_e !=3D NULL && skip_e !=3D NULL); > + gcc_assert (update_e !=3D NULL > + && skip_e !=3D NULL > + && !vect_epilogues_updated_niters); > gphi *new_phi =3D create_phi_node (make_ssa_name (TREE_TYPE (ni= ters)), > update_e->dest); > tree new_ssa =3D make_ssa_name (TREE_TYPE (niters)); > @@ -2953,25 +2977,32 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree n= iters, tree nitersm1, > niters =3D PHI_RESULT (new_phi); > } > > - /* Subtract the number of iterations performed by the vectorized l= oop > - from the number of total iterations. */ > - tree epilogue_niters =3D fold_build2 (MINUS_EXPR, TREE_TYPE (niter= s), > - before_loop_niters, > - niters); > - > - LOOP_VINFO_NITERS (epilogue_vinfo) =3D epilogue_niters; > - LOOP_VINFO_NITERSM1 (epilogue_vinfo) > - =3D fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), > - epilogue_niters, > - build_one_cst (TREE_TYPE (epilogue_niters))); > - > /* Set ADVANCE to the number of iterations performed by the previo= us > loop and its prologue. */ > *advance =3D niters; > > - /* Redo the peeling for niter analysis as the NITERs and alignment > - may have been updated to take the main loop into account. */ > - determine_peel_for_niter (epilogue_vinfo); > + if (!vect_epilogues_updated_niters) > + { > + /* Subtract the number of iterations performed by the vectorize= d loop > + from the number of total iterations. */ > + tree epilogue_niters =3D fold_build2 (MINUS_EXPR, TREE_TYPE (ni= ters), > + before_loop_niters, > + niters); > + > + LOOP_VINFO_NITERS (epilogue_vinfo) =3D epilogue_niters; > + LOOP_VINFO_NITERSM1 (epilogue_vinfo) > + =3D fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), > + epilogue_niters, > + build_one_cst (TREE_TYPE (epilogue_niters))); > + > + /* Decide what to do if the number of epilogue iterations is no= t > + a multiple of the epilogue loop's vectorization factor. > + We should have rejected the loop during the analysis phase > + if this fails. */ > + if (!vect_determine_partial_vectors_and_peeling (epilogue_vinfo= , > + true)) > + gcc_unreachable (); > + } > } > > adjust_vec.release (); > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index b1a6e1508c7..bb9d87fd200 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -814,7 +814,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, = vec_info_shared *shared) > vec_outside_cost (0), > vec_inside_cost (0), > vectorizable (false), > - can_use_partial_vectors_p (true), > + can_use_partial_vectors_p (param_vect_partial_vector_usage !=3D 0), > using_partial_vectors_p (false), > epil_using_partial_vectors_p (false), > peeling_for_gaps (false), > @@ -2003,22 +2003,123 @@ vect_dissolve_slp_only_groups (loop_vec_info loo= p_vinfo) > } > } > > +/* Determine if operating on full vectors for LOOP_VINFO might leave > + some scalar iterations still to do. If so, decide how we should > + handle those scalar iterations. The possibilities are: > > -/* Decides whether we need to create an epilogue loop to handle > - remaining scalar iterations and sets PEELING_FOR_NITERS accordingly. = */ > + (1) Make LOOP_VINFO operate on partial vectors instead of full vector= s. > + In this case: > > -void > -determine_peel_for_niter (loop_vec_info loop_vinfo) > + LOOP_VINFO_USING_PARTIAL_VECTORS_P =3D=3D true > + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P =3D=3D false > + LOOP_VINFO_PEELING_FOR_NITER =3D=3D false > + > + (2) Make LOOP_VINFO operate on full vectors and use an epilogue loop > + to handle the remaining scalar iterations. In this case: > + > + LOOP_VINFO_USING_PARTIAL_VECTORS_P =3D=3D false > + LOOP_VINFO_PEELING_FOR_NITER =3D=3D true > + > + There are two choices: > + > + (2a) Consider vectorizing the epilogue loop at the same VF as the > + main loop, but using partial vectors instead of full vectors. > + In this case: > + > + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P =3D=3D true > + > + (2b) Consider vectorizing the epilogue loop at lower VFs only. > + In this case: > + > + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P =3D=3D false > + > + When FOR_EPILOGUE_P is true, make this determination based on the > + assumption that LOOP_VINFO is an epilogue loop, otherwise make it > + based on the assumption that LOOP_VINFO is the main loop. The caller > + has made sure that the number of iterations is set appropriately for > + this value of FOR_EPILOGUE_P. */ > + > +opt_result > +vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, > + bool for_epilogue_p) > { > - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) =3D false; > + /* Determine whether there would be any scalar iterations left over. = */ > + bool need_peeling_or_partial_vectors_p > + =3D vect_need_peeling_or_partial_vectors_p (loop_vinfo); > > - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) > - /* The main loop handles all iterations. */ > - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) =3D false; > - else if (vect_need_peeling_or_partial_vectors_p (loop_vinfo)) > - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) =3D true; > -} > + /* Decide whether to vectorize the loop with partial vectors. */ > + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > + && need_peeling_or_partial_vectors_p) > + { > + /* For partial-vector-usage=3D1, try to push the handling of parti= al > + vectors to the epilogue, with the main loop continuing to operat= e > + on full vectors. > + > + ??? We could then end up failing to use partial vectors if we > + decide to peel iterations into a prologue, and if the main loop > + then ends up processing fewer than VF iterations. */ > + if (param_vect_partial_vector_usage =3D=3D 1 > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) > + && !vect_known_niters_smaller_than_vf (loop_vinfo)) > + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D true; > + else > + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D true; > + } > + > + if (dump_enabled_p ()) > + { > + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) > + dump_printf_loc (MSG_NOTE, vect_location, > + "operating on partial vectors%s.\n", > + for_epilogue_p ? " for epilogue loop" : ""); > + else > + dump_printf_loc (MSG_NOTE, vect_location, > + "operating only on full vectors%s.\n", > + for_epilogue_p ? " for epilogue loop" : ""); > + } > > + if (for_epilogue_p) > + { > + loop_vec_info orig_loop_vinfo =3D LOOP_VINFO_ORIG_LOOP_INFO (loop_= vinfo); > + gcc_assert (orig_loop_vinfo); > + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) > + gcc_assert (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo), > + LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))); > + } > + > + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) > + { > + /* Check that the loop processes at least one full vector. */ > + poly_uint64 vf =3D LOOP_VINFO_VECT_FACTOR (loop_vinfo); > + tree scalar_niters =3D LOOP_VINFO_NITERS (loop_vinfo); > + if (known_lt (wi::to_widest (scalar_niters), vf)) > + return opt_result::failure_at (vect_location, > + "loop does not have enough iterati= ons" > + " to support vectorization.\n"); > + > + /* If we need to peel an extra epilogue iteration to handle data > + accesses with gaps, check that there are enough scalar iteration= s > + available. > + > + The check above is redundant with this one when peeling for gaps= , > + but the distinction is useful for diagnostics. */ > + tree scalar_nitersm1 =3D LOOP_VINFO_NITERSM1 (loop_vinfo); > + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > + && known_lt (wi::to_widest (scalar_nitersm1), vf)) > + return opt_result::failure_at (vect_location, > + "loop does not have enough iterati= ons" > + " to support peeling for gaps.\n")= ; > + } > + > + LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) > + =3D (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) > + && need_peeling_or_partial_vectors_p); > + > + return opt_result::success (); > +} > > /* Function vect_analyze_loop_2. > > @@ -2272,72 +2373,32 @@ start_over: > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > } > > - /* Decide whether to vectorize a loop with partial vectors for > - this vectorization factor. */ > - if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > - { > - /* Don't use partial vectors if we don't need to peel the loop. *= / > - if (param_vect_partial_vector_usage =3D=3D 0 > - || !vect_need_peeling_or_partial_vectors_p (loop_vinfo)) > - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > - else if (vect_verify_full_masking (loop_vinfo) > - || vect_verify_loop_lens (loop_vinfo)) > - { > - /* The epilogue and other known niters less than VF > - cases can still use vector access with length fully. */ > - if (param_vect_partial_vector_usage =3D=3D 1 > - && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) > - && !vect_known_niters_smaller_than_vf (loop_vinfo)) > - { > - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > - LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D tr= ue; > - } > - else > - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D true; > - } > - else > - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > - } > - else > - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > - > - if (dump_enabled_p ()) > - { > - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) > - dump_printf_loc (MSG_NOTE, vect_location, > - "operating on partial vectors.\n"); > - else > - dump_printf_loc (MSG_NOTE, vect_location, > - "operating only on full vectors.\n"); > - } > - > - /* If epilog loop is required because of data accesses with gaps, > - one additional iteration needs to be peeled. Check if there is > - enough iterations for vectorization. */ > - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > - && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) > - { > - poly_uint64 vf =3D LOOP_VINFO_VECT_FACTOR (loop_vinfo); > - tree scalar_niters =3D LOOP_VINFO_NITERSM1 (loop_vinfo); > - > - if (known_lt (wi::to_widest (scalar_niters), vf)) > - return opt_result::failure_at (vect_location, > - "loop has no enough iterations to" > - " support peeling for gaps.\n"); > - } > + /* If we still have the option of using partial vectors, > + check whether we can generate the necessary loop controls. */ > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > + && !vect_verify_full_masking (loop_vinfo) > + && !vect_verify_loop_lens (loop_vinfo)) > + LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) =3D false; > > /* If we're vectorizing an epilogue loop, the vectorized loop either n= eeds > to be able to handle fewer than VF scalars, or needs to have a lowe= r VF > than the main loop. */ > if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) > - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) > + && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), > LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) > return opt_result::failure_at (vect_location, > "Vectorization factor too high for" > " epilogue loop.\n"); > > + /* Decide whether this loop_vinfo should use partial vectors or peelin= g, > + assuming that the loop will be used as a main loop. We will redo > + this analysis later if we instead decide to use the loop as an > + epilogue loop. */ > + ok =3D vect_determine_partial_vectors_and_peeling (loop_vinfo, false); > + if (!ok) > + return ok; > + > /* Check the costings of the loop make vectorizing worthwhile. */ > res =3D vect_analyze_loop_costing (loop_vinfo); > if (res < 0) > @@ -2350,7 +2411,6 @@ start_over: > return opt_result::failure_at (vect_location, > "Loop costings not worthwhile.\n"); > > - determine_peel_for_niter (loop_vinfo); > /* If an epilogue loop is required make sure we can create one. */ > if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c b/gc= c/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c > index ebb2f45c917..d248f091b52 100644 > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c > @@ -10,6 +10,6 @@ > > /* { dg-final { scan-assembler-times {\mlxvx?\M} 20 } } */ > /* { dg-final { scan-assembler-times {\mstxvx?\M} 10 } } */ > -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */ > -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ > +/* { dg-final { scan-assembler-times {\mlxvl\M} 14 } } */ > +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ > > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c b/gc= c/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > index 9d403287923..a27ee347ca1 100644 > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > @@ -8,4 +8,4 @@ > > #include "p9-vec-length-7.h" > > -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ > +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gc= c/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c > index 6b54a29efaa..961df0d5646 100644 > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c > @@ -8,5 +8,5 @@ > > #include "p9-vec-length-8.h" > > -/* { dg-final { scan-assembler-times {\mlxvl\M} 30 } } */ > -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ > +/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */ > +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */