Here's an updated version of my ldm/stm peepholes patch for current trunk. The main goal of this is to enable ldm/stm generation for Thumb by using define_peephole2 and peep2_find_free_reg rather than define_peephole; there are one or two new peepholes to recognize additional opportunities. I've rerun Cortex-A9 SPEC2000 benchmarks on our 4.4-based tree, where it still causes a tiny performance improvement. Please disregard the previous set of benchmark results (for limiting this to 3/4- or 4-operation sequences only), I think those results were invalid. I've retested these, and limiting the transformation in such a way seems to cause performance drops. Previously there were requests to modify performance tuning, but there were no answers to my questions about how exactly I should go about it, and no information has been forthcoming about actual processor behaviour which could be used to implement meaningful tuning. As requested, I ran some benchmarks and posted results, which were also ignored (fortunately, see above). Since the patch isn't primarily intended to change code generation significantly on ARM/Thumb-2 code anyway (and given the performance results mentioned above), I feel it is unreasonable to hold this up any further. Additional improvements may be possible on top of it, but IMO it's a self-contained improvement as-is. An earlier patch already introduced the multiple_operation_profitable_p function which can be used for tuning. Tested with my usual arm-linux/qemu configuration. Earlier, I posted a fix for the PR44404 problem which showed up. Ok? Bernd