Hi, I've split this particular part off, since it's not only relevant to unrolling. The new test shows how this is useful for existing (non-unrolling) cases. I also had to fix the costing function, the main_vf / epilogue_vf calculations for old and new didn't take into consideration that the main_vf could be lower, nor did it take into consideration that they were not necessarily always a multiple of each other.  So using CEIL here is the correct approach. Bootstrapped and regression tested on aarch64-none-linux-gnu. OK for trunk? gcc/ChangeLog:         * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for epilogue costing.         (vect_analyze_loop): Re-analyze all modes for epilogues. gcc/testsuite/ChangeLog:         * gcc.target/aarch64/masked_epilogue.c: New test. On 30/11/2021 13:56, Richard Biener wrote: > On Tue, 30 Nov 2021, Andre Vieira (lists) wrote: > >> On 25/11/2021 12:46, Richard Biener wrote: >>> Oops, my fault, yes, it does. I would suggest to refactor things so >>> that the mode_i = first_loop_i case is there only once. I also wonder >>> if all the argument about starting at 0 doesn't apply to the >>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well? So >>> what's the reason to differ here? So in the end I'd just change >>> the existing >>> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) >>> { >>> >>> to >>> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo) >>> || first_loop_vinfo->suggested_unroll_factor > 1) >>> { >>> >>> and maybe revisit this when we have an actual testcase showing that >>> doing sth else has a positive effect? >>> >>> Thanks, >>> Richard. >> So I had a quick chat with Richard Sandiford and he is suggesting resetting >> mode_i to 0 for all cases. >> >> He pointed out that for some tunings the SVE mode might come after the NEON >> mode, which means that even for not-unrolled loop_vinfos we could end up with >> a suboptimal choice of mode for the epilogue. I.e. it could be that we pick >> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd not >> try VNx16QI for the epilogue. >> >> This would simplify the mode selecting cases, by just simply restarting at >> mode_i in all epilogue cases. Is that something you'd be OK? > Works for me with an updated comment. Even better with showing a > testcase exercising such tuning. > > Richard.