Ventana has had a variant of this patch from Andrew W. in its tree for 
at least a year.   I'm dusting it off and submitting it on Andrew's behalf.

There's multiple approaches we could be using here.

First we could make $ra fixed and use it as the scratch register for the 
long branch sequences.

Second, we could add a match_scratch to all the conditional branch 
patterns and allow the register allocator to assign the scratch register 
from the pool of GPRs.

Third we could do register scavenging.  This can usually work, though it 
can get complex in some scenarios.

Forth we could use trampolines for extended reach.

Andrew's original patch did a bit of the first approach (make $ra fixed) 
and mostly the second approach.  The net is it was probably the worst in 
terms of impacting code generation -- we lost a register *and* forced 
every branch instruction to get a scratch register allocated.

I had expected the second approach to produce better code than the 
first, but that wasn't actually the case in practice.  It's probably a 
combination of allocating a GPR at every branch point (even with a life 
of a single insn, there's a cost) and perhaps the additional operands on 
conditional branches spoiling simplistic pattern matching in one or more 
passes.

In addition to performing better based on dynamic instruction counts, 
the first approach is significantly simpler to implement.  Given those 
two positives, that's what I've chosen to go with.  Yes it does remove 
$ra from the set of registers available, but the impact of that is *tiny*.

If someone wanted to dive into one of the other approaches to address a 
real world impact, that's great.  If that happens I would strongly 
suggest also evaluating perlbench from spec2017.  It seems particularly 
sensitive to this issue in terms of approach #2's impact on code generation.

I've built & regression tested this variant on the vt1 configuration 
without regressions.  Earlier versions have been bootstrapped as well.

Pushed to the trunk,

Jeff