public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Post-register-allocation opportunitistic optimizer?
@ 2002-05-03 13:20 tm
  2002-05-03 14:02 ` Richard Henderson
  2002-05-03 14:23 ` Kazu Hirata
  0 siblings, 2 replies; 11+ messages in thread
From: tm @ 2002-05-03 13:20 UTC (permalink / raw)
  To: gcc; +Cc: kazu, joern.rennecke


I'm looking through H8/300h code, and I realize I want a
post-register-allocation opportunitistic optimizer.

To explain, I need to backtrack a bit. In the past, I've mentioned that
GCC handles high register pressure badly, and it should rerun the
optimizer over the original RTX with flags set to avoid generating new
pseudos.

I think this situation may be better handled by some sort of
post-register-allocation opportunitistic optimizer.

To explain simply, we would initially generate code which uses minial
scratch registers, then PRAOO would run after register allocation and
opportunistically replace slow code sequence which require no scratch
register with faster code sequences which require a scratch
register ONLY IF an unused hard register is available.

For example, GCC generates this code for a right shift by 8 on the
H8/300H:

        mov.w   e0,r2
        mov.b   r0h,r0l
        mov.b   r2l,r0h
        mov.b   r2h,r2l
        exts.w  r2
        mov.w   r2,e0

This is fast code but it uses an extra register (r2). It is undesirable if
the compiled function is complex and the register pressure is already
high. In a high register pressure case, we would probably want:

	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0

which is slower but avoids using a scratch register, and thus avoids
spilling a register.

I know the Hitachi SH has the same problem, because the earlier
implementations (SH1 and SH2) lack a barrel shifter and have only
instructions with fixed shift counts.

My first question is: are there other processors which must choose between
code which is "faster and uses scratch register" vs. "slower and no
scratch register"? 

I'm assuming other processors have the same problem. If so, it sounds
better to implement a generic solution rather than hack a 
MACHINE_DEPENDENT_REORG.

The second question is the appropriate implementation of such a feature.
I can think of a few different implementations:

1. Hack register allocation to handle this. This sounds ugly.

2. Hack combine to understand register pressure and rerun after
   reload. This also sounds ugly.

3. New optimizer pass which runs after global alloc which 
   opportunistically replaces slow sequences with fast sequences if hard
   registers are available.

Is #3 the right solution, or are there better solutions available?

Toshi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-05-04  0:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-03 13:20 Post-register-allocation opportunitistic optimizer? tm
2002-05-03 14:02 ` Richard Henderson
2002-05-03 14:50   ` tm
2002-05-03 15:16     ` Peter Barada
2002-05-03 15:46     ` Richard Henderson
2002-05-03 15:57       ` tm
2002-05-03 16:22         ` Richard Henderson
2002-05-03 16:54           ` tm
2002-05-03 17:06             ` Richard Henderson
2002-05-03 14:23 ` Kazu Hirata
2002-05-03 14:38   ` law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).