public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Post-register-allocation opportunitistic optimizer?
@ 2002-05-03 13:20 tm
  2002-05-03 14:02 ` Richard Henderson
  2002-05-03 14:23 ` Kazu Hirata
  0 siblings, 2 replies; 11+ messages in thread
From: tm @ 2002-05-03 13:20 UTC (permalink / raw)
  To: gcc; +Cc: kazu, joern.rennecke


I'm looking through H8/300h code, and I realize I want a
post-register-allocation opportunitistic optimizer.

To explain, I need to backtrack a bit. In the past, I've mentioned that
GCC handles high register pressure badly, and it should rerun the
optimizer over the original RTX with flags set to avoid generating new
pseudos.

I think this situation may be better handled by some sort of
post-register-allocation opportunitistic optimizer.

To explain simply, we would initially generate code which uses minial
scratch registers, then PRAOO would run after register allocation and
opportunistically replace slow code sequence which require no scratch
register with faster code sequences which require a scratch
register ONLY IF an unused hard register is available.

For example, GCC generates this code for a right shift by 8 on the
H8/300H:

        mov.w   e0,r2
        mov.b   r0h,r0l
        mov.b   r2l,r0h
        mov.b   r2h,r2l
        exts.w  r2
        mov.w   r2,e0

This is fast code but it uses an extra register (r2). It is undesirable if
the compiled function is complex and the register pressure is already
high. In a high register pressure case, we would probably want:

	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0

which is slower but avoids using a scratch register, and thus avoids
spilling a register.

I know the Hitachi SH has the same problem, because the earlier
implementations (SH1 and SH2) lack a barrel shifter and have only
instructions with fixed shift counts.

My first question is: are there other processors which must choose between
code which is "faster and uses scratch register" vs. "slower and no
scratch register"? 

I'm assuming other processors have the same problem. If so, it sounds
better to implement a generic solution rather than hack a 
MACHINE_DEPENDENT_REORG.

The second question is the appropriate implementation of such a feature.
I can think of a few different implementations:

1. Hack register allocation to handle this. This sounds ugly.

2. Hack combine to understand register pressure and rerun after
   reload. This also sounds ugly.

3. New optimizer pass which runs after global alloc which 
   opportunistically replaces slow sequences with fast sequences if hard
   registers are available.

Is #3 the right solution, or are there better solutions available?

Toshi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 13:20 Post-register-allocation opportunitistic optimizer? tm
@ 2002-05-03 14:02 ` Richard Henderson
  2002-05-03 14:50   ` tm
  2002-05-03 14:23 ` Kazu Hirata
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2002-05-03 14:02 UTC (permalink / raw)
  To: tm; +Cc: gcc, kazu, joern.rennecke

On Fri, May 03, 2002 at 01:12:47PM -0700, tm wrote:
> 3. New optimizer pass which runs after global alloc which 
>    opportunistically replaces slow sequences with fast sequences if hard
>    registers are available.

See -frename-registers and/or the documentation for define_peephole2.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 13:20 Post-register-allocation opportunitistic optimizer? tm
  2002-05-03 14:02 ` Richard Henderson
@ 2002-05-03 14:23 ` Kazu Hirata
  2002-05-03 14:38   ` law
  1 sibling, 1 reply; 11+ messages in thread
From: Kazu Hirata @ 2002-05-03 14:23 UTC (permalink / raw)
  To: tm; +Cc: gcc, joern.rennecke

Hi Toshi,

> For example, GCC generates this code for a right shift by 8 on the
> H8/300H:
> 
>         mov.w   e0,r2
>         mov.b   r0h,r0l
>         mov.b   r2l,r0h
>         mov.b   r2h,r2l
>         exts.w  r2
>         mov.w   r2,e0

I was thinking about the same thing.  Shifts that require loops are in
the same situation.  Also, if you want to use stw.l for argument push
on H8/S, you need to know what registers are available, but this might
require MACHINE_DEPENDENT_REORG.

> 3. New optimizer pass which runs after global alloc which 
>    opportunistically replaces slow sequences with fast sequences if hard
>    registers are available.

This sounds very reasonable to me.

Kazu Hirata

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 14:23 ` Kazu Hirata
@ 2002-05-03 14:38   ` law
  0 siblings, 0 replies; 11+ messages in thread
From: law @ 2002-05-03 14:38 UTC (permalink / raw)
  To: Kazu Hirata; +Cc: tm, gcc, joern.rennecke

In message <20020503.172316.55515951.kazu@cs.umass.edu>, Kazu Hirata writes:
 > Hi Toshi,
 > 
 > > For example, GCC generates this code for a right shift by 8 on the
 > > H8/300H:
 > > 
 > >         mov.w   e0,r2
 > >         mov.b   r0h,r0l
 > >         mov.b   r2l,r0h
 > >         mov.b   r2h,r2l
 > >         exts.w  r2
 > >         mov.w   r2,e0
 > 
 > I was thinking about the same thing.  Shifts that require loops are in
 > the same situation.  Also, if you want to use stw.l for argument push
 > on H8/S, you need to know what registers are available, but this might
 > require MACHINE_DEPENDENT_REORG.
 > 
 > > 3. New optimizer pass which runs after global alloc which 
 > >    opportunistically replaces slow sequences with fast sequences if hard
 > >    registers are available.
 > 
 > This sounds very reasonable to me.
Look at how peephole2 works.  It's designed to be able to replace one sequence
with another (possibly faster, smaller, whatever) when hard registers are
available.

jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 14:02 ` Richard Henderson
@ 2002-05-03 14:50   ` tm
  2002-05-03 15:16     ` Peter Barada
  2002-05-03 15:46     ` Richard Henderson
  0 siblings, 2 replies; 11+ messages in thread
From: tm @ 2002-05-03 14:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc, kazu, joern.rennecke

On Fri, 3 May 2002, Richard Henderson wrote:

> On Fri, May 03, 2002 at 01:12:47PM -0700, tm wrote:
> > 3. New optimizer pass which runs after global alloc which 
> >    opportunistically replaces slow sequences with fast sequences if hard
> >    registers are available.
> 
> See -frename-registers and/or the documentation for define_peephole2.

define_peephole2 comes close, but doesn't quite solve the problem.

Consider a basic block contains multiple opportunistically optimizable
sequences, and two free hard registers.

If I understand correctly, define_peephole2 will optimize the first two
sequences (assuming they both use one hard reg apiece) and fail to
optimize the rest of the cases.

Ideally, the optimizer would sort the opportunities by benefit and only
perform the best N optimizations rather than simply optimize the first N
sequences.

Could peephole2 be modified to accomodate this, or is it better
implemented as a separate pass?

Toshi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 14:50   ` tm
@ 2002-05-03 15:16     ` Peter Barada
  2002-05-03 15:46     ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Barada @ 2002-05-03 15:16 UTC (permalink / raw)
  To: tm; +Cc: rth, gcc, kazu, joern.rennecke


>If I understand correctly, define_peephole2 will optimize the first two
>sequences (assuming they both use one hard reg apiece) and fail to
>optimize the rest of the cases.

I think peephole2 works backwards from the end of the function to the
front, so it would optimize the last two sequences...

-- 
Peter Barada                                   Peter.Barada@motorola.com
Wizard                                         781-852-2768 (direct)
WaveMark Solutions(wholly owned by Motorola)   781-270-0193 (fax)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 14:50   ` tm
  2002-05-03 15:16     ` Peter Barada
@ 2002-05-03 15:46     ` Richard Henderson
  2002-05-03 15:57       ` tm
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2002-05-03 15:46 UTC (permalink / raw)
  To: tm; +Cc: gcc, kazu, joern.rennecke

On Fri, May 03, 2002 at 02:42:02PM -0700, tm wrote:
> If I understand correctly, define_peephole2 will optimize the first two
> sequences (assuming they both use one hard reg apiece) and fail to
> optimize the rest of the cases.

Why do you think this?  The set of live registers is recomputed
for every independent peep2 optimization.  Thus they should all
be optimized.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 15:46     ` Richard Henderson
@ 2002-05-03 15:57       ` tm
  2002-05-03 16:22         ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: tm @ 2002-05-03 15:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc, kazu, joern.rennecke

On Fri, 3 May 2002, Richard Henderson wrote:

> On Fri, May 03, 2002 at 02:42:02PM -0700, tm wrote:
> > If I understand correctly, define_peephole2 will optimize the first two
> > sequences (assuming they both use one hard reg apiece) and fail to
> > optimize the rest of the cases.
> 
> Why do you think this?  The set of live registers is recomputed
> for every independent peep2 optimization.  Thus they should all
> be optimized.

Maybe I didn't communicate well.

Imagine a situation in a basic block with two hard registers free.
An analysis of the code reveals the following possible optimizations:

optimization    scratch hard regs       RTX_COST
opportunity         required            benefit
-----------------------------------------------
   #1                 1                   3
   #2                 1                   5
   #3                 2                  11

In this case, if you perform optimization #1 first, then there is only
one hard register free, and only optimization #2 can be performed for a
total benefit of 3 + 5 = 8.

However, better heuristics should decide to spend the two hard registers
on optimization #3 which yields a net benefit of 11, which is better than
optimizations #1 and #2 combined.

Toshi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 15:57       ` tm
@ 2002-05-03 16:22         ` Richard Henderson
  2002-05-03 16:54           ` tm
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2002-05-03 16:22 UTC (permalink / raw)
  To: tm; +Cc: gcc, kazu, joern.rennecke

On Fri, May 03, 2002 at 03:49:58PM -0700, tm wrote:
> Imagine a situation in a basic block with two hard registers free.
> An analysis of the code reveals the following possible optimizations:
> 
> optimization    scratch hard regs       RTX_COST
> opportunity         required            benefit
> -----------------------------------------------
>    #1                 1                   3
>    #2                 1                   5
>    #3                 2                  11
> 
> In this case, if you perform optimization #1 first, then there is only
> one hard register free,

Stop.  You are already incorrect.

The register used in a peep2 will _only_ be used within that single
peephole pattern, thus it is dead after the matched sequence, thus
there are two free registers for _every_ matched sequence.

If you're thinking of this in combination with some post-reload CSE
pass, peep2 won't do.  In fact it sounds like nothing but a better
register allocator will do.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 16:22         ` Richard Henderson
@ 2002-05-03 16:54           ` tm
  2002-05-03 17:06             ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: tm @ 2002-05-03 16:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc, kazu, joern.rennecke

On Fri, 3 May 2002, Richard Henderson wrote:

> On Fri, May 03, 2002 at 03:49:58PM -0700, tm wrote:
> > Imagine a situation in a basic block with two hard registers free.
> > An analysis of the code reveals the following possible optimizations:
> > 
> > optimization    scratch hard regs       RTX_COST
> > opportunity         required            benefit
> > -----------------------------------------------
> >    #1                 1                   3
> >    #2                 1                   5
> >    #3                 2                  11
> > 
> > In this case, if you perform optimization #1 first, then there is only
> > one hard register free,
> 
> Stop.  You are already incorrect.
> 
> The register used in a peep2 will _only_ be used within that single
> peephole pattern, thus it is dead after the matched sequence, thus
> there are two free registers for _every_ matched sequence.

Ah, I made a bad assumption. I understand this now.

So we can naively emit long sequences of single shifts, and have peep2
clean it up for us later if hard registers are available. Nice!

Toshi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Post-register-allocation opportunitistic optimizer?
  2002-05-03 16:54           ` tm
@ 2002-05-03 17:06             ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2002-05-03 17:06 UTC (permalink / raw)
  To: tm; +Cc: gcc, kazu, joern.rennecke

On Fri, May 03, 2002 at 04:46:03PM -0700, tm wrote:
> So we can naively emit long sequences of single shifts, and have peep2
> clean it up for us later if hard registers are available. Nice!

Um, except that peep2 only works with adjacent instructions.
You're probably better off modeling these as a single insn
through reload and splitting them with peep2.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-05-04  0:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-03 13:20 Post-register-allocation opportunitistic optimizer? tm
2002-05-03 14:02 ` Richard Henderson
2002-05-03 14:50   ` tm
2002-05-03 15:16     ` Peter Barada
2002-05-03 15:46     ` Richard Henderson
2002-05-03 15:57       ` tm
2002-05-03 16:22         ` Richard Henderson
2002-05-03 16:54           ` tm
2002-05-03 17:06             ` Richard Henderson
2002-05-03 14:23 ` Kazu Hirata
2002-05-03 14:38   ` law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).