Hi! This improves the stack usage on the sha512 test case for the case without hardware fpu and without iwmmxt by splitting all di-mode patterns right while expanding which is similar to what the shift-pattern does. It does nothing in the case iwmmxt and fpu=neon or vfp as well as thumb1. It reduces the stack usage from 2300 to near optimal 272 bytes (!). Note this also splits many ldrd/strd instructions and therefore I will post a followup-patch that mitigates this effect by enabling the ldrd/strd peephole optimization after the necessary reg-testing. Bootstrapped and reg-tested on arm-linux-gnueabihf. Is it OK for trunk? Thanks Bernd.