Hi Richi,

I think this is what you meant, I now hide all the unrolling cost 
calculations in the existing target hooks for costs. I did need to 
adjust 'finish_cost' to take the loop_vinfo so the target's 
implementations are able to set the newly renamed 'suggested_unroll_factor'.

Also added the checks for the epilogue's VF.

Is this more like what you had in mind?


gcc/ChangeLog:

         * config/aarch64/aarch64.c (aarch64_finish_cost): Add class 
vec_info parameter.
         * config/i386/i386.c (ix86_finish_cost): Likewise.
         * config/rs6000/rs6000.c (rs6000_finish_cost): Likewise.
         * doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST.
         * target.def: Add class vec_info parameter to finish_cost.
         * targhooks.c (default_finish_cost): Likewise.
         * targhooks.h (default_finish_cost): Likewise.
         * tree-vect-loop.c (vect_determine_vectorization_factor): Use 
suggested_unroll_factor
         to increase vectorization_factor if possible.
         (_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor 
member.
         (vect_compute_single_scalar_iteration_cost): Adjust call to 
finish_cost.
         (vect_determine_partial_vectors_and_peeling): Ensure unrolled 
loop is not predicated.
         (vect_determine_unroll_factor): New.
         (vect_try_unrolling): New.
         (vect_reanalyze_as_main_loop): Also try to unroll when 
reanalyzing as main loop.
         (vect_analyze_loop): Add call to vect_try_unrolling and check 
to ensure epilogue
         is either a smaller VF than main loop or uses partial vectors 
and might be of equal
         VF.
         (vect_estimate_min_profitable_iters): Adjust call to finish_cost.
         (vectorizable_reduction): Make sure to not use 
single_defuse_cyle when unrolling.
         * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust 
call to finish_cost.
         * tree-vectorizer.h (finish_cost): Change to pass new class 
vec_info parameter.

On 01/10/2021 09:19, Richard Biener wrote:
> On Thu, 30 Sep 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>>
>>>> That just forces trying the vector modes we've tried before. Though I might
>>>> need to revisit this now I think about it. I'm afraid it might be possible
>>>> for
>>>> this to generate an epilogue with a vf that is not lower than that of the
>>>> main
>>>> loop, but I'd need to think about this again.
>>>>
>>>> Either way I don't think this changes the vector modes used for the
>>>> epilogue.
>>>> But maybe I'm just missing your point here.
>>> Yes, I was refering to the above which suggests that when we vectorize
>>> the main loop with V4SF but unroll then we try vectorizing the
>>> epilogue with V4SF as well (but not unrolled).  I think that's
>>> premature (not sure if you try V8SF if the main loop was V4SF but
>>> unrolled 4 times).
>> My main motivation for this was because I had a SVE loop that vectorized with
>> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll
>> V8HI by two and skipped using VNx8HI as a predicated epilogue which would've
>> been the best choice.
> I see, yes - for fully predicated epilogues it makes sense to consider
> the same vector mode as for the main loop anyways (independent on
> whether we're unrolling or not).  One could argue that with an
> unrolled V4SImode main loop a predicated V8SImode epilogue would also
> be a good match (but then somehow costing favored the unrolled V4SI
> over the V8SI for the main loop...).
>
>> So that is why I decided to just 'reset' the vector_mode selection. In a
>> scenario where you only have the traditional vector modes it might make less
>> sense.
>>
>> Just realized I still didn't add any check to make sure the epilogue has a
>> lower VF than the previous loop, though I'm still not sure that could happen.
>> I'll go look at where to add that if you agree with this.
> As said above, it only needs a lower VF in case the epilogue is not
> fully masked - otherwise the same VF would be OK.
>
>>>> I can move it there, it would indeed remove the need for the change to
>>>> vect_update_vf_for_slp, the change to
>>>> vect_determine_partial_vectors_and_peeling would still be required I think.
>>>> It
>>>> is meant to disable using partial vectors in an unrolled loop.
>>> Why would we disable the use of partial vectors in an unrolled loop?
>> The motivation behind that is that the overhead caused by generating
>> predicates for each iteration will likely be too much for it to be profitable
>> to unroll. On top of that, when dealing with low iteration count loops, if
>> executing one predicated iteration would be enough we now still need to
>> execute all other unrolled predicated iterations, whereas if we keep them
>> unrolled we skip the unrolled loops.
> OK, I guess we're not factoring in costs when deciding on predication
> but go for it if it's gernally enabled and possible.
>
> With the proposed scheme we'd then cost the predicated not unrolled
> loop against a not predicated unrolled loop which might be a bit
> apples vs. oranges also because the target made the unroll decision
> based on the data it collected for the predicated loop.
>
>>> Sure but I'm suggesting you keep the not unrolled body as one way of
>>> costed vectorization but then if the target says "try unrolling"
>>> re-do the analysis with the same mode but a larger VF.  Just like
>>> we iterate over vector modes you'll now iterate over pairs of
>>> vector mode + VF (unroll factor).  It's not about re-using the costing
>>> it's about using costing that is actually relevant and also to avoid
>>> targets inventing two distinct separate costings - a target (powerpc)
>>> might already compute load/store density and other stuff for the main
>>> costing so it should have an idea whether doubling or triplicating is OK.
>>>
>>> Richard.
>> Sounds good! I changed the patch to determine the unrolling factor later,
>> after all analysis has been done and retry analysis if an unrolling factor
>> larger than 1 has been chosen for this loop and vector_mode.
>>
>> gcc/ChangeLog:
>>
>>          * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR.
>>          * doc/tm.texi.in: Add entries for TARGET_VECTORIZE_UNROLL_FACTOR.
>>          * params.opt: Add vect-unroll and vect-unroll-reductions
>> parameters.
> What's the reason to add the --params?  It looks like this makes
> us unroll with a static number short-cutting the target.
>
> IMHO that's never going to be a great thing - but what we could do
> is look at loop->unroll and try to honor that (factoring in that
> the vectorization factor is already the times we unroll).
>
> So I'd leave those params out for now, the user would have a much
> more fine-grained way to control this with the unroll pragma.
>
> Adding a max-vect-unroll parameter would be another thing but that
> would apply after the targets or pragma decision.
>
>>          * target.def: Define hook TARGET_VECTORIZE_UNROLL_FACTOR.
> I still do not like the new target hook - as said I'd like to
> make you have the finis_cost hook allow the target to specify
> a suggested unroll factor instead because that's the point where
> it has all the info.
>
> Thanks,
> Richard.
>
>>          * targhooks.c (default_unroll_factor): New.
>>          * targhooks.h (default_unroll_factor): Likewise.
>>          * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>>          par_unrolling_factor.
>>          (vect_determine_partial_vectors_and_peeling): Account for
>> unrolling.
>>          (vect_determine_unroll_factor): New.
>>          (vect_try_unrolling): New.
>>          (vect_reanalyze_as_main_loop): Call vect_try_unrolling when
>>          retrying a loop_vinfo as a main loop.
>>          (vect_analyze_loop): Call vect_try_unrolling when vectorizing
>> main loops.
>>          (vect_analyze_loop): Allow for epilogue vectorization when unrolling
>>          and rewalk vector_mode warray for the epilogues.
>>          (vectorizable_reduction): Disable single_defuse_cycle when
>> unrolling.
>>          * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor
>>          as a member of loop_vec_info.
>>