This is work originally started by Joern @ Embecosm.

There's been a long standing sense that we're generating too many 
sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
focused on a relatively narrow part of the extension problem.

What Joern's patch does is introduce a new pass which tracks liveness of 
chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
and 32..63.

If it encounters a sign/zero extend that sets bits that are never read, 
then it replaces the sign/zero extension with a narrowing subreg.  The 
narrowing subreg usually gets eliminated by subsequent passes (it's just 
a copy after all).

Jivan has done some analysis and found that it eliminates roughly 1% of 
the dynamic instruction stream for x264 as well as some redundant 
extensions in the coremark benchmark (both on rv64).  In my own testing 
as I worked through issues on other architectures I clearly saw it 
helping in various places within GCC itself or in the testsuite.

The basic structure is to first do a fairly standard liveness analysis 
on the chunks, seeding original state with the liveness data from DF. 
Once that's stable, we do a final pass to identify the useless 
extensions and transform them into narrowing subregs.

A few key points to remember.

For destination processing it is always safe to ignore a destination. 
Ignoring a destination merely means that whatever was live after the 
given insn will continue to be live before the insn.  What is not safe 
is to clear a bit in the LIVENOW bitmap for a destination chunk that is 
not set.  This comes into play with things like STRICT_LOW_PART.

For source processing the safe thing to do is to set all the chunks in a 
register as live.  It is never safe to fail to process a source operand.

When a destination object is not fully live, we try to transfer that 
limited liveness to the source operands.  So for example if bits 16..63 
are dead in a destination of a PLUS, we need not mark bits 16..63 as 
live for the source operands.  We have to be careful -- consider a shift 
count on a target without SHIFT_COUNT_TRUNCATED set.  So we have both a 
list of RTL codes where we can transfer liveness and a few codes where 
one of the operands may need to be fully live (ex, a shift count) while 
the other input may not need to be fully live (value left shifted).

Locally we have had this enabled at -O1 and above to encourage testing, 
but I'm thinking that for the trunk enabling at -O2 and above is the 
right thing to do.

This has (of course) been tested on rv64.  It's also been bootstrapped 
and regression tested on x86.  Bootstrap and regression tested (C only) 
for m68k, sh4, sh4eb, alpha.  Earlier versions were also bootstrapped 
and regression tested on ppc, hppa and s390x (C only for those as well). 
  It's also been tested on the various crosses in my tester.  So we've 
got reasonable coverage of 16, 32 and 64 bit targets, big and little 
endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other 
oddities.

The included tests are for RISC-V only because not all targets are going 
to have extraneous extensions.   There's tests from coremark, x264 and 
GCC's bz database.  It probably wouldn't be hard to add aarch64 
testscases.  The BZs listed are improved by this patch for aarch64.

Given the amount of work Jivan and I have done, I'm not comfortable 
self-approving at this time.  I'd much rather have another set of eyes 
on the code.  Hopefully the code is documented well enough for that to 
be useful exercise.

So, no need to work from Pago Pago for this patch.  I may make another 
attempt at the eswin conditional move work while working virtually in 
Pago Pago though.

Thoughts, comments, recommendations?

Jeff