We currently support counted loops on several targets by having the
loop-doloop pass search for suitable loops and convert them to using the
doloop_end pattern. Unfortunately, there are various machines which have
hardware loop support, but need additional tests and transformations,
during the final stage of compilation, to ensure we can make use of
their hardware.

For example, Blackfin has an LSETUP instruction which sets up a loop
start and loop end address; the hardware compares the program counter to
the loop end address register and automatically resets it to the loop
start address if the end is reached. The encoding of LSETUP has a
limited amount of bits, which forces to compiler to ensure that an upper
bound for the loop's length is not exceeded. The loop end must be after
the loop start, which sometimes requires us to reorder the CFG.

On C6X, we'd like to make use of the SPLOOP/SPKERNEL instructions, which
define a hardware software pipelined loop. Instructions after SPLOOP are
copied into a limited-size loop buffer from which they are reexecuted;
SPKERNEL indicates the end of the loop.

We have some target code that deals with this. Nathan Sidwell originally
wrote this for the mt port (then called ms1, now removed), it was then
reused for the Blackfin by Jie Zhang and myself. For C6X, I've decided
that there is sufficient commonality that we can move parts of the
machinery into a target-independent file, callable from the backends
with a set of hooks that do the actual transformations. This is what the
patch below does.

There are some odd problems when regression testing on Blackfin right
now, but I've got two runs with identical results. Doing before/after
comparisons on a set of .i files, there is one case in my collection
where this patch produces different code; this seems merely a difference
between a long branch and a short branch. Not sure why it happens, but
it seems innocuous.

Ok?


Bernd