This is work originally started by Joern @ Embecosm. There's been a long standing sense that we're generating too many sign/zero extensions on the RISC-V port. REE is useful, but it's really focused on a relatively narrow part of the extension problem. What Joern's patch does is introduce a new pass which tracks liveness of chunks of pseudo regs. Specifically it tracks bits 0..7, 8..15, 16..31 and 32..63. If it encounters a sign/zero extend that sets bits that are never read, then it replaces the sign/zero extension with a narrowing subreg. The narrowing subreg usually gets eliminated by subsequent passes (it's just a copy after all). Jivan has done some analysis and found that it eliminates roughly 1% of the dynamic instruction stream for x264 as well as some redundant extensions in the coremark benchmark (both on rv64). In my own testing as I worked through issues on other architectures I clearly saw it helping in various places within GCC itself or in the testsuite. The basic structure is to first do a fairly standard liveness analysis on the chunks, seeding original state with the liveness data from DF. Once that's stable, we do a final pass to identify the useless extensions and transform them into narrowing subregs. A few key points to remember. For destination processing it is always safe to ignore a destination. Ignoring a destination merely means that whatever was live after the given insn will continue to be live before the insn. What is not safe is to clear a bit in the LIVENOW bitmap for a destination chunk that is not set. This comes into play with things like STRICT_LOW_PART. For source processing the safe thing to do is to set all the chunks in a register as live. It is never safe to fail to process a source operand. When a destination object is not fully live, we try to transfer that limited liveness to the source operands. So for example if bits 16..63 are dead in a destination of a PLUS, we need not mark bits 16..63 as live for the source operands. We have to be careful -- consider a shift count on a target without SHIFT_COUNT_TRUNCATED set. So we have both a list of RTL codes where we can transfer liveness and a few codes where one of the operands may need to be fully live (ex, a shift count) while the other input may not need to be fully live (value left shifted). Locally we have had this enabled at -O1 and above to encourage testing, but I'm thinking that for the trunk enabling at -O2 and above is the right thing to do. This has (of course) been tested on rv64. It's also been bootstrapped and regression tested on x86. Bootstrap and regression tested (C only) for m68k, sh4, sh4eb, alpha. Earlier versions were also bootstrapped and regression tested on ppc, hppa and s390x (C only for those as well). It's also been tested on the various crosses in my tester. So we've got reasonable coverage of 16, 32 and 64 bit targets, big and little endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other oddities. The included tests are for RISC-V only because not all targets are going to have extraneous extensions. There's tests from coremark, x264 and GCC's bz database. It probably wouldn't be hard to add aarch64 testscases. The BZs listed are improved by this patch for aarch64. Given the amount of work Jivan and I have done, I'm not comfortable self-approving at this time. I'd much rather have another set of eyes on the code. Hopefully the code is documented well enough for that to be useful exercise. So, no need to work from Pago Pago for this patch. I may make another attempt at the eswin conditional move work while working virtually in Pago Pago though. Thoughts, comments, recommendations? Jeff