Hi Richi, I think this is what you meant, I now hide all the unrolling cost calculations in the existing target hooks for costs. I did need to adjust 'finish_cost' to take the loop_vinfo so the target's implementations are able to set the newly renamed 'suggested_unroll_factor'. Also added the checks for the epilogue's VF. Is this more like what you had in mind? gcc/ChangeLog:         * config/aarch64/aarch64.c (aarch64_finish_cost): Add class vec_info parameter.         * config/i386/i386.c (ix86_finish_cost): Likewise.         * config/rs6000/rs6000.c (rs6000_finish_cost): Likewise.         * doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST.         * target.def: Add class vec_info parameter to finish_cost.         * targhooks.c (default_finish_cost): Likewise.         * targhooks.h (default_finish_cost): Likewise.         * tree-vect-loop.c (vect_determine_vectorization_factor): Use suggested_unroll_factor         to increase vectorization_factor if possible.         (_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor member.         (vect_compute_single_scalar_iteration_cost): Adjust call to finish_cost.         (vect_determine_partial_vectors_and_peeling): Ensure unrolled loop is not predicated.         (vect_determine_unroll_factor): New.         (vect_try_unrolling): New.         (vect_reanalyze_as_main_loop): Also try to unroll when reanalyzing as main loop.         (vect_analyze_loop): Add call to vect_try_unrolling and check to ensure epilogue         is either a smaller VF than main loop or uses partial vectors and might be of equal         VF.         (vect_estimate_min_profitable_iters): Adjust call to finish_cost.         (vectorizable_reduction): Make sure to not use single_defuse_cyle when unrolling.         * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust call to finish_cost.         * tree-vectorizer.h (finish_cost): Change to pass new class vec_info parameter. On 01/10/2021 09:19, Richard Biener wrote: > On Thu, 30 Sep 2021, Andre Vieira (lists) wrote: > >> Hi, >> >> >>>> That just forces trying the vector modes we've tried before. Though I might >>>> need to revisit this now I think about it. I'm afraid it might be possible >>>> for >>>> this to generate an epilogue with a vf that is not lower than that of the >>>> main >>>> loop, but I'd need to think about this again. >>>> >>>> Either way I don't think this changes the vector modes used for the >>>> epilogue. >>>> But maybe I'm just missing your point here. >>> Yes, I was refering to the above which suggests that when we vectorize >>> the main loop with V4SF but unroll then we try vectorizing the >>> epilogue with V4SF as well (but not unrolled). I think that's >>> premature (not sure if you try V8SF if the main loop was V4SF but >>> unrolled 4 times). >> My main motivation for this was because I had a SVE loop that vectorized with >> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll >> V8HI by two and skipped using VNx8HI as a predicated epilogue which would've >> been the best choice. > I see, yes - for fully predicated epilogues it makes sense to consider > the same vector mode as for the main loop anyways (independent on > whether we're unrolling or not). One could argue that with an > unrolled V4SImode main loop a predicated V8SImode epilogue would also > be a good match (but then somehow costing favored the unrolled V4SI > over the V8SI for the main loop...). > >> So that is why I decided to just 'reset' the vector_mode selection. In a >> scenario where you only have the traditional vector modes it might make less >> sense. >> >> Just realized I still didn't add any check to make sure the epilogue has a >> lower VF than the previous loop, though I'm still not sure that could happen. >> I'll go look at where to add that if you agree with this. > As said above, it only needs a lower VF in case the epilogue is not > fully masked - otherwise the same VF would be OK. > >>>> I can move it there, it would indeed remove the need for the change to >>>> vect_update_vf_for_slp, the change to >>>> vect_determine_partial_vectors_and_peeling would still be required I think. >>>> It >>>> is meant to disable using partial vectors in an unrolled loop. >>> Why would we disable the use of partial vectors in an unrolled loop? >> The motivation behind that is that the overhead caused by generating >> predicates for each iteration will likely be too much for it to be profitable >> to unroll. On top of that, when dealing with low iteration count loops, if >> executing one predicated iteration would be enough we now still need to >> execute all other unrolled predicated iterations, whereas if we keep them >> unrolled we skip the unrolled loops. > OK, I guess we're not factoring in costs when deciding on predication > but go for it if it's gernally enabled and possible. > > With the proposed scheme we'd then cost the predicated not unrolled > loop against a not predicated unrolled loop which might be a bit > apples vs. oranges also because the target made the unroll decision > based on the data it collected for the predicated loop. > >>> Sure but I'm suggesting you keep the not unrolled body as one way of >>> costed vectorization but then if the target says "try unrolling" >>> re-do the analysis with the same mode but a larger VF. Just like >>> we iterate over vector modes you'll now iterate over pairs of >>> vector mode + VF (unroll factor). It's not about re-using the costing >>> it's about using costing that is actually relevant and also to avoid >>> targets inventing two distinct separate costings - a target (powerpc) >>> might already compute load/store density and other stuff for the main >>> costing so it should have an idea whether doubling or triplicating is OK. >>> >>> Richard. >> Sounds good! I changed the patch to determine the unrolling factor later, >> after all analysis has been done and retry analysis if an unrolling factor >> larger than 1 has been chosen for this loop and vector_mode. >> >> gcc/ChangeLog: >> >>         * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR. >>         * doc/tm.texi.in: Add entries for TARGET_VECTORIZE_UNROLL_FACTOR. >>         * params.opt: Add vect-unroll and vect-unroll-reductions >> parameters. > What's the reason to add the --params? It looks like this makes > us unroll with a static number short-cutting the target. > > IMHO that's never going to be a great thing - but what we could do > is look at loop->unroll and try to honor that (factoring in that > the vectorization factor is already the times we unroll). > > So I'd leave those params out for now, the user would have a much > more fine-grained way to control this with the unroll pragma. > > Adding a max-vect-unroll parameter would be another thing but that > would apply after the targets or pragma decision. > >>         * target.def: Define hook TARGET_VECTORIZE_UNROLL_FACTOR. > I still do not like the new target hook - as said I'd like to > make you have the finis_cost hook allow the target to specify > a suggested unroll factor instead because that's the point where > it has all the info. > > Thanks, > Richard. > >>         * targhooks.c (default_unroll_factor): New. >>         * targhooks.h (default_unroll_factor): Likewise. >>         * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize >>         par_unrolling_factor. >>         (vect_determine_partial_vectors_and_peeling): Account for >> unrolling. >>         (vect_determine_unroll_factor): New. >>         (vect_try_unrolling): New. >>         (vect_reanalyze_as_main_loop): Call vect_try_unrolling when >>         retrying a loop_vinfo as a main loop. >>         (vect_analyze_loop): Call vect_try_unrolling when vectorizing >> main loops. >>         (vect_analyze_loop): Allow for epilogue vectorization when unrolling >>         and rewalk vector_mode warray for the epilogues. >>         (vectorizable_reduction): Disable single_defuse_cycle when >> unrolling. >>         * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor >>         as a member of loop_vec_info. >>