-----  Respin of the below patch ----- In this 2/2 patch, from v1 to v2 I have: * Removed the modification the interface of the doloop_end target-insn (so I no longer need to touch any other target backends) * Added more modes to `arm_get_required_vpr_reg` to make it flexible between searching: all operands/only input arguments/only outputs. Also added helpers: `arm_get_required_vpr_reg_ret_val` `arm_get_required_vpr_reg_param` * Added support for the use of other VPR predicate values within a dlstp/letp loop, as long as they don't originate from the vctp-generated VPR value. Also changed `arm_mve_get_loop_unique_vctp` to the simpler `arm_mve_get_loop_vctp` since now we can support other VCTP insns within the loop. * Added support for loops of the form:      int num_of_iters = (num_of_elem + num_of_lanes - 1) / num_of_lanes      for (i = 0; i < num_of_iters; i++)        {          p = vctp (num_of_elem)          n -= num_of_lanes;        }    to be tranformed into dlstp/letp loops. * Changed the VCTP look-ahead for SIGN_EXTEND and SUBREG insns to use df def/use chains instead of `next_nonnote_nondebug_insn_bb`. * Added support for using unpredicated (but predicable) insns within the dlstp/letp loop. These need to meet some specific conditions, because they _will_ become implicitly tail predicated by the dlstp/letp transformation. * Added a df chain check to any other instructions to make sure that they don't USE the VCTP-generated VPR value. * Added testing of all these various edge cases. Original email with updated Changelog at the end: Hi all, This is the 2/2 patch that contains the functional changes needed for MVE Tail Predicated Low Overhead Loops.  See my previous email for a general introduction of MVE LOLs. This support is added through the already existing loop-doloop mechanisms that are used for non-MVE dls/le looping. Changes are: 1) Relax the loop-doloop mechanism in the mid-end to allow for    decrement numbers other that -1 and for `count` to be an    rtx containing the number of elements to be processed, rather    than an expression for calculating the number of iterations. 2) Add a `allow_elementwise_doloop` target hook. This allows the    target backend to manipulate the iteration count as it needs:    in our case to change it from a pre-calculation of the number    of iterations to the number of elements to be processed. 3) The doloop_end target-insn now had an additional parameter:    the `count` (note: this is before it gets modified to just be    the number of elements), so that the decrement value is    extracted from that parameter. And many things in the backend to implement the above optimisation: 4)  Appropriate changes to the define_expand of doloop_end and new     patterns for dlstp and letp. 5) `arm_attempt_dlstp_transform`: (called from the define_expand of     doloop_end) this function checks for the loop's suitability for     dlstp/letp transformation and then implements it, if possible. 6) `arm_mve_get_loop_unique_vctp`: A function that loops through     the loop contents and returns the vctp VPR-genereting operation     within the loop, if it is unique and there is exclusively one     vctp within the loop. 7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map    from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`    to check an insn to see if it requires the VPR or not. No regressions on arm-none-eabi with various targets and on aarch64-none-elf. Thoughts on getting this into trunk? Thank you, Stam Markianos-Wright gcc/ChangeLog:         * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.         * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.         (arm_mve_get_vctp_lanes): New.         (arm_get_required_vpr_reg): New.         (arm_get_required_vpr_reg_ret_val): New.         (arm_get_required_vpr_reg_param): New.         (arm_mve_get_loop_vctp): New.         (arm_attempt_dlstp_transform): New.         (arm_allow_elementwise_doloop): New.         * config/arm/iterators.md (DLSTP): New.         (mode1): Add DLSTP mappings.         * config/arm/mve.md (*predicated_doloop_end_internal): New.         (dlstp_insn): New.         * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.         * config/arm/unspecs.md: New unspecs.         * tm.texi: Document new hook.         * tm.texi.in: Likewise.         * loop-doloop.cc (doloop_condition_get): Relax conditions.         (doloop_optimize): Add support for elementwise LoLs.         * target.def (allow_elementwise_doloop): New hook.         * targhooks.cc (default_allow_elementwise_doloop): New.         * targhooks.h (default_allow_elementwise_doloop): New. gcc/testsuite/ChangeLog:         * gcc.target/arm/lob.h: Update framework.         * gcc.target/arm/lob1.c: Likewise.         * gcc.target/arm/lob6.c: Likewise.         * gcc.target/arm/dlstp-int16x8.c: New test.         * gcc.target/arm/dlstp-int32x4.c: New test.         * gcc.target/arm/dlstp-int64x2.c: New test.         * gcc.target/arm/dlstp-int8x16.c: New test.         * gcc.target/arm/dlstp-invalid-asm.c: New test.         * gcc.target/arm/dlstp-compile-asm.c: Add testcases.