public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Reload patch to improve 386 code
@ 1997-08-18 14:53 Toon Moene
  0 siblings, 0 replies; 33+ messages in thread
From: Toon Moene @ 1997-08-18 14:53 UTC (permalink / raw)
  To: egcs

>  I have come up with a patch for the reload pass that can
>  improve the generated code on the i386 quite dramatically.
>  The problem with reload is that when it decides to spill
>  a hard register, it spills _all_ pseudos allocated to
>  this hard register.  This is often not necessary, it is
>  possible to use register life information to determine
>  that certain pseudos are not live across any instructions
>  that need reloads, and thus need not be spilled.

Isn't this something that would have been solved [ coniunctivus  
irrealis ! ] by live range splitting ?

"pseudo's that are not live across instructions" sounds to me as  
the definition of the place to split the liveness range.

Or am I missing something ?

OTOH, "beter een vogel in de hand dan tien in de lucht" (sorry,  
couldn't resist).

Cheers,
Toon.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-07  9:48       ` Bernd Schmidt
  1998-09-07 19:09         ` Joern Rennecke
@ 1998-09-07 19:09         ` Jeffrey A Law
  1 sibling, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-07 19:09 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: meissner, toon, egcs

  In message < Pine.GSO.4.02A.9809071547250.23612-100000@matlock.informatik.rwth-aachen.de >you write:
  > It really shouldn't. Inheritance only looks at the insns that reload is
  > already done with, and the one currently being processed.  For those
  > which reload has completed, the register life information calculated by
  > my patch is never referenced again, so the inheritance code can do
  > whatever it wants to them.
Ok.  We're probably OK then.


  > > We actually have most of the stuff to implement this now.  We just need
  > > to set the alias set for such MEMs so that they have a different set
  > > than all the MEMs created before reload.
  > 
  > We'd need to be a bit careful about this, since some of these MEMs might be
  > replaced (for example) in instruction splitting later on, and the special
  > "this MEM made by reload" alias set must be preserved during such operation s.
Actually if it is not preserved, then we're still OK since the alias
set would be set to zero, indicating that we don't know what alias
set the split mems belong to.

While nonoptimal, it will work until we find a way to preserve the
alias set in the split MEMs (which may be trivial in this case, I
don't know).

One of the things we need to do throughout the compiler is propagate
the alias set information through to more places.  I suspect that
during the various optimization phases we're losing some alias set
info provided by the front end.

jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-07  9:48       ` Bernd Schmidt
@ 1998-09-07 19:09         ` Joern Rennecke
  1998-09-07 18:26           ` Jeffrey A Law
  1998-09-07 19:09         ` Jeffrey A Law
  1 sibling, 1 reply; 33+ messages in thread
From: Joern Rennecke @ 1998-09-07 19:09 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: law, meissner, toon, egcs

> > Then again inheritance happens just before we start emitting the
> > reload insns themselves, so maybe it doesn't conflict with your code.
> 
> It really shouldn't. Inheritance only looks at the insns that reload is
> already done with, and the one currently being processed.  For those
> which reload has completed, the register life information calculated by
> my patch is never referenced again, so the inheritance code can do
> whatever it wants to them.

You'll have to add some code to tell the inheritance code when a reload
register becomes unavailable where it becomes used by a pseudo.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-07 19:09         ` Joern Rennecke
@ 1998-09-07 18:26           ` Jeffrey A Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-07 18:26 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Bernd Schmidt, meissner, toon, egcs

  In message < 199809080059.BAA01948@phal.cygnus.co.uk >you write:
  > > > Then again inheritance happens just before we start emitting the
  > > > reload insns themselves, so maybe it doesn't conflict with your code.
  > > 
  > > It really shouldn't. Inheritance only looks at the insns that reload is
  > > already done with, and the one currently being processed.  For those
  > > which reload has completed, the register life information calculated by
  > > my patch is never referenced again, so the inheritance code can do
  > > whatever it wants to them.
  > 
  > You'll have to add some code to tell the inheritance code when a reload
  > register becomes unavailable where it becomes used by a pseudo.
Yup.  But that shouldn't be too difficult.  In forget_old_reloads_1
we look at reg_renumber for the pseudo and invalidate things as needed.

jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-05  4:45     ` Jeffrey A Law
  1998-09-05 15:49       ` Joern Rennecke
@ 1998-09-07  9:48       ` Bernd Schmidt
  1998-09-07 19:09         ` Joern Rennecke
  1998-09-07 19:09         ` Jeffrey A Law
  1 sibling, 2 replies; 33+ messages in thread
From: Bernd Schmidt @ 1998-09-07  9:48 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: meissner, toon, egcs

>   In message < Pine.GSO.4.02A.9809041107001.19368-100000@matlock.informatik.rwth-aachen.de >you write:
>   > Inheritance doesn't seem to cause problems with my patch; I've never touched
>   > it and did not run into bugs.
> Well, when dealing with reload "it seems to work for me" isn't enough :-)
> 
> The primary issue is that when a reload is inherited its lifetime is
> extended, possibly beyond the original insn where it was used as a
> reload reg.
> 
> So, when we inherit reloads, we have to make sure to note that the
> reloadreg is marked as live for the longer range.
> 
> Then again inheritance happens just before we start emitting the
> reload insns themselves, so maybe it doesn't conflict with your code.

It really shouldn't. Inheritance only looks at the insns that reload is
already done with, and the one currently being processed.  For those
which reload has completed, the register life information calculated by
my patch is never referenced again, so the inheritance code can do
whatever it wants to them.

> We actually have most of the stuff to implement this now.  We just need
> to set the alias set for such MEMs so that they have a different set
> than all the MEMs created before reload.

We'd need to be a bit careful about this, since some of these MEMs might be
replaced (for example) in instruction splitting later on, and the special
"this MEM made by reload" alias set must be preserved during such operations.

Bernd


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-06  1:09           ` Mark Mitchell
@ 1998-09-06  1:09             ` Jeffrey A Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-06  1:09 UTC (permalink / raw)
  To: mark; +Cc: amylaar, crux, meissner, toon, egcs

  In message < 199809060428.VAA02085@smtp.earthlink.net >you write:
  > I don't see why the base address tracking code should have to do
  > anything.  My plan was just to use different alias sets for each spill
  > register (we have 2^32 of them, after all!).  If you like, you could
  > reuse these from function to function, so the maximum number of alias
  > sets used up this was would be the number of spills in any one
  > function.  My guess is that if that gets close to 4 billion, we have
  > worse problems.
  > 
  > I'd be happy (delighted, even) to implement this, but I'd like a test
  > case that someone things will benefit.  (I am in *no* way doubting the
  > existence of such a thing, but having one would allow me to verify my
  > work.) 
Just seemed easier to have a single alias set for spills.  I'm not
particularly partial to either solution.  They should both work.

As for a testcase.  Who knows.  I would think it would be reasonably
straightforward to build one.

ANy benefit right now would be in sched2 and possibly reload_cse since
we don't have generalized spill code motion yet.

Presumably we'd need a loop and something in the loop needs to be
spilled to the stack and we need unrelated memory accesses throuh a
pointer which we know nothing about.

jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-05 15:44         ` Jeffrey A Law
@ 1998-09-06  1:09           ` Mark Mitchell
  1998-09-06  1:09             ` Jeffrey A Law
  0 siblings, 1 reply; 33+ messages in thread
From: Mark Mitchell @ 1998-09-06  1:09 UTC (permalink / raw)
  To: law; +Cc: amylaar, crux, meissner, toon, egcs

>>>>> "Jeffrey" == Jeffrey A Law <law@cygnus.com> writes:

    Jeffrey>   In message < 199809052204.XAA14607@phal.cygnus.co.uk >you
    Jeffrey> write:
    >> > We actually have most of the stuff to implement this now.  We
    >> just need > to set the alias set for such MEMs so that they
    >> have a different set > than all the MEMs created before reload.
    >> 
    >> It's not quite that simple.  You also have to know that
    >> different stack slots don't alias each other.

    Jeffrey> By setting the alias set, we disambiguate all the spill
    Jeffrey> slots from all the other pointers in the code.

    Jeffrey> We then let the existing base address tracking code
    Jeffrey> disambiguate the stack slots from each other.

I don't see why the base address tracking code should have to do
anything.  My plan was just to use different alias sets for each spill
register (we have 2^32 of them, after all!).  If you like, you could
reuse these from function to function, so the maximum number of alias
sets used up this was would be the number of spills in any one
function.  My guess is that if that gets close to 4 billion, we have
worse problems.

I'd be happy (delighted, even) to implement this, but I'd like a test
case that someone things will benefit.  (I am in *no* way doubting the
existence of such a thing, but having one would allow me to verify my
work.) 

    Jeffrey> jeff

-- 
Mark Mitchell 			mark@markmitchell.com
Mark Mitchell Consulting	http://www.markmitchell.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-05  4:45     ` Jeffrey A Law
@ 1998-09-05 15:49       ` Joern Rennecke
  1998-09-05 15:44         ` Jeffrey A Law
  1998-09-07  9:48       ` Bernd Schmidt
  1 sibling, 1 reply; 33+ messages in thread
From: Joern Rennecke @ 1998-09-05 15:49 UTC (permalink / raw)
  To: law; +Cc: crux, meissner, toon, egcs

> We actually have most of the stuff to implement this now.  We just need
> to set the alias set for such MEMs so that they have a different set
> than all the MEMs created before reload.

It's not quite that simple.  You also have to know that different stack
slots don't alias each other.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-05 15:49       ` Joern Rennecke
@ 1998-09-05 15:44         ` Jeffrey A Law
  1998-09-06  1:09           ` Mark Mitchell
  0 siblings, 1 reply; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-05 15:44 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: crux, meissner, toon, egcs

  In message < 199809052204.XAA14607@phal.cygnus.co.uk >you write:
  > > We actually have most of the stuff to implement this now.  We just need
  > > to set the alias set for such MEMs so that they have a different set
  > > than all the MEMs created before reload.
  > 
  > It's not quite that simple.  You also have to know that different stack
  > slots don't alias each other.
By setting the alias set, we disambiguate all the spill slots from
all the other pointers in the code.

We then let the existing base address tracking code disambiguate the
stack slots from each other.

jeff


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-04  5:05   ` Bernd Schmidt
  1998-09-04 22:36     ` Joern Rennecke
@ 1998-09-05  4:45     ` Jeffrey A Law
  1998-09-05 15:49       ` Joern Rennecke
  1998-09-07  9:48       ` Bernd Schmidt
  1 sibling, 2 replies; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-05  4:45 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: meissner, toon, egcs

  In message < Pine.GSO.4.02A.9809041107001.19368-100000@matlock.informatik.rwth-aachen.de >you write:
  > Inheritance doesn't seem to cause problems with my patch; I've never touched
  > it and did not run into bugs.
Well, when dealing with reload "it seems to work for me" isn't enough :-)

The primary issue is that when a reload is inherited its lifetime is
extended, possibly beyond the original insn where it was used as a
reload reg.

So, when we inherit reloads, we have to make sure to note that the
reloadreg is marked as live for the longer range.

Then again inheritance happens just before we start emitting the
reload insns themselves, so maybe it doesn't conflict with your code.


  > I just think it's horribly complicated and
  > could be done somewhat cleaner, and more reliable.
It certainly needs better documentation.  Joern has code to make it
find more opportunities, but they're still being cleaned up.  In the
process he has improved the docs a little.

  > case.  The reload_cse_regs code and the inheritance code do some very similar
  > things, and I think it would be better to have only one piece of code that
  > handles everything in a general fashion.
They do similar things, but work in fundamentally different ways.  I
do not see the need to tie them together in one single general
optimization.

In fact, the better reload inheritance works, the better reload cse
will work.

  > The only major obstacle I see is that currently the MEMs made by reload for
  > spill slots in the stack frame don't get handled very well by the alias code.
  > If alias.c knew that these don't alias with other MEMs, reload_cse_regs wou ld
  > immediately be much more powerful.
We actually have most of the stuff to implement this now.  We just need
to set the alias set for such MEMs so that they have a different set
than all the MEMs created before reload.



jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-04  5:05   ` Bernd Schmidt
@ 1998-09-04 22:36     ` Joern Rennecke
  1998-09-05  4:45     ` Jeffrey A Law
  1 sibling, 0 replies; 33+ messages in thread
From: Joern Rennecke @ 1998-09-04 22:36 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: law, meissner, toon, egcs

> Inheritance doesn't seem to cause problems with my patch; I've never touched
> it and did not run into bugs.  I just think it's horribly complicated and
> could be done somewhat cleaner, and more reliable.  Just a few days ago I ran
> across a piece of code for which reload generated twice as many instructions
> as necessary, just because the inheritance code didn't catch a particular
> case.  The reload_cse_regs code and the inheritance code do some very similar
> things, and I think it would be better to have only one piece of code that
> handles everything in a general fashion.

I have patches to make the inheritance perform better and more consistently.

> The only major obstacle I see is that currently the MEMs made by reload for
> spill slots in the stack frame don't get handled very well by the alias code.
> If alias.c knew that these don't alias with other MEMs, reload_cse_regs would
> immediately be much more powerful.

reload_cse_regs can't do register re-assignments, and it doesn't see when
a sequence of instruction loads a register that it has clobbered first.
Actually, it doesn't even begin to understand multi-instruction sequences
that load a value.  It is really a weaker concept than reload inheritance.
The only time when it performs better than a good reload inheritance is
when the lifetime of reloads becomes muddled (e.g. when stuff is merged to
RELOAD_OTHER, and there are other reloads for the same insn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-04  5:03 ` Jeffrey A Law
  1998-09-04  5:05   ` Bernd Schmidt
@ 1998-09-04 21:38   ` Joern Rennecke
  1998-09-04 21:38     ` Jeffrey A Law
  1 sibling, 1 reply; 33+ messages in thread
From: Joern Rennecke @ 1998-09-04 21:38 UTC (permalink / raw)
  To: law; +Cc: crux, meissner, toon, egcs

> While reload_cse_regs can help this stuff, I'm not sure that it totally
> eliminates the need for the reload inheritance stuff.  In fact, I'm
> sure Joern can show you lots of case where improving inheritance leads
> to better code.

Actually, it's not that simple.  The test cases are confidential, and I
don't have time right now to look for other test cases or synthesize
artifical ones.

> So, I'm not sure it's time to ditch the inheritance code yet.  Given
> the structure of the locally spilling reload code, I do see how reload
> inheriting gets noticably more complicated.
> 
> One thing we should try is a cook off between the locally spilling reload
> code (with inheritance disabled) and the existing reload code.  If
> the locally spilling reloader generally wins, I'll support disabling
> inheritance to get the benefit of local spilling.  Then we can go
> back later and try to make inheritance work with your reload code.


This is not acceptable for the SH target.  Reload inheritance is of paramount
importance.  The x86 doesn't suffer from these problems because it's so
CISCy that almost every address generated for a typical C program is valid...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-04 21:38   ` Joern Rennecke
@ 1998-09-04 21:38     ` Jeffrey A Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-04 21:38 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: crux, meissner, toon, egcs

  In message < 199809041944.UAA32705@phal.cygnus.co.uk >you write:

  > > So, I'm not sure it's time to ditch the inheritance code yet.  Given
  > > the structure of the locally spilling reload code, I do see how reload
  > > inheriting gets noticably more complicated.
  > > 
  > > One thing we should try is a cook off between the locally spilling reload
  > > code (with inheritance disabled) and the existing reload code.  If
  > > the locally spilling reloader generally wins, I'll support disabling
  > > inheritance to get the benefit of local spilling.  Then we can go
  > > back later and try to make inheritance work with your reload code.
  > 
  > 
  > This is not acceptable for the SH target.  Reload inheritance is of paramount
  > importance.  The x86 doesn't suffer from these problems because it's so
  > CISCy that almost every address generated for a typical C program is valid.
Please reread my paragraph again.

We will look at both and pick whichever gives better overall results.
That may mean specific cases or specific targets are hurt.  We will
have to go back and fix them.

If it turns out that most targets benefit, but the SH loses, then I'm
going to go with the new code.  Sorry, but the needs of the any 
single port alone do not control the direction of the egcs project.


jeff


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1998-09-04  5:03 ` Jeffrey A Law
@ 1998-09-04  5:05   ` Bernd Schmidt
  1998-09-04 22:36     ` Joern Rennecke
  1998-09-05  4:45     ` Jeffrey A Law
  1998-09-04 21:38   ` Joern Rennecke
  1 sibling, 2 replies; 33+ messages in thread
From: Bernd Schmidt @ 1998-09-04  5:05 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: meissner, toon, egcs

> While reload_cse_regs can help this stuff, I'm not sure that it totally
> eliminates the need for the reload inheritance stuff.  In fact, I'm
> sure Joern can show you lots of case where improving inheritance leads
> to better code.
> 
> So, I'm not sure it's time to ditch the inheritance code yet.  Given
> the structure of the locally spilling reload code, I do see how reload
> inheriting gets noticably more complicated.
> 
> One thing we should try is a cook off between the locally spilling reload
> code (with inheritance disabled) and the existing reload code.  If
> the locally spilling reloader generally wins, I'll support disabling
> inheritance to get the benefit of local spilling.  Then we can go
> back later and try to make inheritance work with your reload code.
> 
> [ Then again, if you've made inheritance work, then there's no need
>   for the cook off :-)  I guess I'll find out when I start actually
>   looking at the code. ]

Inheritance doesn't seem to cause problems with my patch; I've never touched
it and did not run into bugs.  I just think it's horribly complicated and
could be done somewhat cleaner, and more reliable.  Just a few days ago I ran
across a piece of code for which reload generated twice as many instructions
as necessary, just because the inheritance code didn't catch a particular
case.  The reload_cse_regs code and the inheritance code do some very similar
things, and I think it would be better to have only one piece of code that
handles everything in a general fashion.
The only major obstacle I see is that currently the MEMs made by reload for
spill slots in the stack frame don't get handled very well by the alias code.
If alias.c knew that these don't alias with other MEMs, reload_cse_regs would
immediately be much more powerful.

Bernd


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
       [not found] <Pine.SOL.3.90.970819095143.291G-100000@starsky.informatik.rwth-aachen.de>
@ 1998-09-04  5:03 ` Jeffrey A Law
  1998-09-04  5:05   ` Bernd Schmidt
  1998-09-04 21:38   ` Joern Rennecke
  0 siblings, 2 replies; 33+ messages in thread
From: Jeffrey A Law @ 1998-09-04  5:03 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: meissner, toon, egcs

Another reply to an old message from Bernd...

  In message <Pine.SOL.3.90.970819095143.291G-100000@starsky.informatik.rwth-aachen.de>you write:
  > Then, there are some simplifications that could be done. I don't like the
  > inheritance code, find_equiv_reg and all that. IMHO reload shouldn't try to
  > be very clever about this sort of thing - the reload_cse_regs pass can be made
  > more clever.
While reload_cse_regs can help this stuff, I'm not sure that it totally
eliminates the need for the reload inheritance stuff.  In fact, I'm
sure Joern can show you lots of case where improving inheritance leads
to better code.

So, I'm not sure it's time to ditch the inheritance code yet.  Given
the structure of the locally spilling reload code, I do see how reload
inheriting gets noticably more complicated.

One thing we should try is a cook off between the locally spilling reload
code (with inheritance disabled) and the existing reload code.  If
the locally spilling reloader generally wins, I'll support disabling
inheritance to get the benefit of local spilling.  Then we can go
back later and try to make inheritance work with your reload code.

[ Then again, if you've made inheritance work, then there's no need
  for the cook off :-)  I guess I'll find out when I start actually
  looking at the code. ]


jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-21 16:51 Problems on PowerPC David Edelsohn
@ 1997-08-21 17:37 ` Bernd Schmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Bernd Schmidt @ 1997-08-21 17:37 UTC (permalink / raw)
  To: egcs

> You are right, I mixed up something here.  What I actually saw was when
> an inherit was missed, the new reload - or its secondary reloads -
> would clobber other reload registers, which could otherwise have been used
> for subsequent inherits.
> reload_cse coudn't clean up the mess - it just changed the reload for the
> missed inherit from a memory-register to a register-register move.
> It isn't clever enough to reassign reload registers and then reclaim the
> freed register for further cse.

There are a zillion variations of this kind of problem. The following
code, or something that looks similar, is generated rather frequently:

movl 16(%esp),%eax
movl (%eax),%eax
movl 16(%esp),%edx

Another problem was mentioned today: the compiler really should move
constants that are used multiple times into registers if possible.  These
problems are similar, maybe they can be addressed together.  Again, I
think reload_cse_regs might be the right place to add cleverness about this.
I'll send some initial patches tomorrow or so.

Bernd

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-21 16:51 Joern Rennecke
  0 siblings, 0 replies; 33+ messages in thread
From: Joern Rennecke @ 1997-08-21 16:51 UTC (permalink / raw)
  To: egcs

> > The problem with this is that you might have additional spills if you
> > remove reload inherits - when you inherit a reload that needs a secondary
> > reload, you extend the lifetime of only one reload register, instead of needing
> > two (or more, when a tertiary reload would be needed too.)
> 
> Are you sure? The inheritance code in choose_reload_regs is run after all
> spilling has been done, so I don't think removing it could cause additional
> spills.


You are right, I mixed up something here.  What I actually saw was when
an inherit was missed, the new reload - or its secondary reloads -
would clobber other reload registers, which could otherwise have been used
for subsequent inherits.
reload_cse coudn't clean up the mess - it just changed the reload for the
missed inherit from a memory-register to a register-register move.
It isn't clever enough to reassign reload registers and then reclaim the
freed register for further cse.

> The reason I'm not happy with the inheritance code is that find_equiv_reg
> is overly conservative; e.g. it misses cases like this:
> 
> movl %eax,16(%esp)
> movl %eax,some_symbol
> addl 16(%esp),%ebx
> 
> where the addl could inherit the value in %eax. My idea is to teach

I think my inheritance patches would fix that too.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-21 15:20 egcs repository Joel Sherrill
@ 1997-08-21 15:47 ` Bernd Schmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Bernd Schmidt @ 1997-08-21 15:47 UTC (permalink / raw)
  To: egcs

> 
> The problem with this is that you might have additional spills if you
> remove reload inherits - when you inherit a reload that needs a secondary
> reload, you extend the lifetime of only one reload register, instead of needing
> two (or more, when a tertiary reload would be needed too.)

Are you sure? The inheritance code in choose_reload_regs is run after all
spilling has been done, so I don't think removing it could cause additional
spills.
The reason I'm not happy with the inheritance code is that find_equiv_reg
is overly conservative; e.g. it misses cases like this:

movl %eax,16(%esp)
movl %eax,some_symbol
addl 16(%esp),%ebx

where the addl could inherit the value in %eax. My idea is to teach
reload_cse_regs about these cases, and if it can be made clever enough,
maybe (hopefully) the inheritance code in choose_reload_regs will become
redundant.

> Moreover, if you remove reload inherits from / to memory, you might end
> up with code that looks like it could have an alias problem for doing cse on
> it, and reload_cse can't do anything.

I'm aware of that. So far I have a patch that makes reload keep track of
all memory locations in the frame that it creates and teaches the alias
analysis that those can't conflict with any other memory references.  The
patch is very simple, only a couple of lines, and eliminates that part of
the problem completely from what I have seen. It also reduces dependencies
between insns, so it would be good to have something like this for the
benefit of the scheduler as well.

> > Another way of simplifying reload would be to try to generate auto-inc
> > addressing after reload has run, not before. That would eliminate some more
> > special-case code (and it might make other parts of the compiler simpler as
> > well).
> 
> But then you wouldn't get the heuristics for the earlier passes right.

Which heuristics are those?

Bernd

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-19 19:00 Joern Rennecke
  0 siblings, 0 replies; 33+ messages in thread
From: Joern Rennecke @ 1997-08-19 19:00 UTC (permalink / raw)
  To: egcs

> Then, there are some simplifications that could be done. I don't like the
> inheritance code, find_equiv_reg and all that. IMHO reload shouldn't try to be
> very clever about this sort of thing - the reload_cse_regs pass can be made
> more clever. I've already submitted a patch to Kenner that enables
> reload_cse_regs to generate optional reloads. If we could add some more
> cleverness (e.g. deleting redundant stores into spill slots or eliminating
> register-register copies), quite a bit of code in reload could be deleted.
> I've already made some experiments in this direction which indicate that this
> approach may be feasible.

The problem with this is that you might have additional spills if you
remove reload inherits - when you inherit a reload that needs a secondary
reload, you extend the lifetime of only one reload register, instead of needing
two (or more, when a tertiary reload would be needed too.)

Moreover, if you remove reload inherits from / to memory, you might end
up with code that looks like it could have an alias problem for doing cse on
it, and reload_cse can't do anything.

I actually think that the reload inheriting code should be mae smarter.
I did some patches to do that and got more benefit for the code I tested
than from reload_cse.  However, they never got accepted for the FSF gcc...
I'll re-fit them into egcs when I find some time for that.  Should I
sent them as multiple (conflicting) patches or as a single patch?

The main issues are:

- inheriting reloads from non-reload registers

- emitting more complete REG_ notes when doing reloads, so that they can
  be found for inherits

- being smarter about finding possible inherits

> Another way of simplifying reload would be to try to generate auto-inc
> addressing after reload has run, not before. That would eliminate some more
> special-case code (and it might make other parts of the compiler simpler as
> well).

But then you wouldn't get the heuristics for the earlier passes right.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-19  8:50 Jakub Jelinek
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Jelinek @ 1997-08-19  8:50 UTC (permalink / raw)
  To: egcs

> 
>    Date: Mon, 18 Aug 1997 11:17:38 -0400 (EDT)
>    From: meissner@cygnus.com
> 
>    I think it is an horrible kludge that reload is run as a pass after
>    global-alloc, and that it forces reload registers not to be used
>    for any other purpose (which is murder on the x86 with each
>    register being special purpose in some way).
> 
> Before this leaves my head, I wanted to point something out which
> you've reminded me of.  When the scheduler (this applies to both the
> original and Haifa versions equally) becomes aggressive, it produces a
> large number of reloads in certain situations.  Reloads which would
> not have happened if scheduling did not take place.  This happens
> especially if register pressure is high already.  I noticed this
> particularly on RISC platforms, seems in this case the more registers
> available the worse things became when the register usage was
> saturated.

I thought about a quick solution, which would be during global-alloc, if it
finds out that the number of hard registers is exceeded, it could try to
undo some short pseudo setup RTL sequence merges and move them to the place
of the actual use, if the pseudo being set up is a constant and computable
in small number of instructions not involving memory loads.
Like that, we could rid of the following horror on sparc64:

	sethi %hi(var1), %r1
	stx %r1, [%sp + NN]
	...
	ldx [%sp + NN], %r1
	or %r1, %lo(var1), %r1
	stx %r1, [%sp + NN]
	...
some loop:
	...
	ldx [%sp + NN], %r1
	ldx [%r1], %r1
	...

and could have:

some loop:
	...
	sethi %hi(var1), %r1
	ldx [%r1 + %lo(var1)], %r1
	...

instead...

Cheers,
    Jakub
___________________________________________________________________
Jakub Jelinek | jj@sunsite.mff.cuni.cz | http://sunsite.mff.cuni.cz
Administrator of SunSITE Czech Republic, MFF, Charles University
___________________________________________________________________
Ultralinux - first 64bit OS to take fool power from the UltraSparc
Linux version 2.0.30 on a sparc machine (291.64 BogoMips).
___________________________________________________________________

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
  1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
@ 1997-08-19  8:08 ` Bernd Schmidt
  1997-08-19  8:08 ` Bernd Schmidt
  2 siblings, 0 replies; 33+ messages in thread
From: Bernd Schmidt @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

> My comment on reload is meant for more
> reload as we currently know it.  I think it is an horrible kludge that reload
> is run as a pass after global-alloc, and that it forces reload registers not to
> be used for any other purpose (which is murder on the x86 with each register
> being special purpose in some way).

That problem can be solved; actually it is mostly solved in my patch.

> I think it should be integrated into global-alloc, taking lifetimes, ranges,
> etc. into consideration, possibly leaving the fossil reload for when you are
> not optimizing.  Given that reload has pretty much been the place where we all
> fear to tread, at least since RMS handed over the gcc2 reigns to Kenner (and
> even in the 1.3x time frame, I got the sense that RMS no longer had a good
> handle on reload anymore), I think it is time for a rethought.  Now, given
> it is a rather critical piece of the compiler, it may take months if not
> years to get it better than it currently is.

I have some ideas for changing reload that could go on top of what I have. I'll
describe what I'm planning to do, if you have any additional ideas I'd be happy
to hear about them.
First, I'd like to eliminate the code that counts the needs for registers
globally. The patch I sent is a first step in that direction; it could probably
easily be extended to spill registers locally for every instruction. This would
require a slightly different way of calculating the possible damage from
spilling a register, but I think it would not be hard to do.
It would be nice to have code that detects that an insn needs a spill reg in a
one-register class (like ecx for variable shifts on the 386), but a different
register is free. In that case, reload should move ecx to the free register
before the instruction, and back afterwards, if this is profitable. I'm not
sure how hard that would be, and I haven't experimented with that kind of
thing yet.
Then, there are some simplifications that could be done. I don't like the
inheritance code, find_equiv_reg and all that. IMHO reload shouldn't try to be
very clever about this sort of thing - the reload_cse_regs pass can be made
more clever. I've already submitted a patch to Kenner that enables
reload_cse_regs to generate optional reloads. If we could add some more
cleverness (e.g. deleting redundant stores into spill slots or eliminating
register-register copies), quite a bit of code in reload could be deleted.
I've already made some experiments in this direction which indicate that this
approach may be feasible.
Another way of simplifying reload would be to try to generate auto-inc
addressing after reload has run, not before. That would eliminate some more
special-case code (and it might make other parts of the compiler simpler as
well).

Bernd

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
@ 1997-08-19  8:08 ` Jeffrey A Law
  1997-08-19  8:08 ` Bernd Schmidt
  1997-08-19  8:08 ` Bernd Schmidt
  2 siblings, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

  In message <Pine.SOL.3.90.970819094031.291D-100000@starsky.informatik.rwth-aa
chen.de>you write:
  > The idea of running sched before reload seems to be to improve code like
  > this:
  > move mem1 => pseudo1
  > move pseudo1 => mem2
  > move mem3 => pseudo2
  > move pseudo2 => mem4
  > move mem5 => pseudo3
  > move pseudo3 => mem6
Or, more generally:

  * The scheduling pass before reload has fewer data dependencies (because
  it works with pseudos), and thus can more instructions more freely to
  produce better schedules.  However, the downside is potentially increased
  register pressure.

  * The scheduling pass after reload primarily helps with scheduling of
  spill code and prologue/epilogue insns.  With pseudos mapped to hard
  registers, there are far fewer opportunities for the scheduler to
  move instructions.

  > If this is left as it stands, register allocation will most likely allocate
  > the pseudo to the same hard register. This means the post-reload sched pass
  > can't do anything with it, and the CPU can't either because there is no
  > parallelism in the code (well, at least the Pentium can't).
Yup.

  > you suddenly have two blocks of three independent instructions which could
  > run in parallel. However, this will lose badly once you don't have three
  > instructions of that kind, but a hundred (since your average CPU doesn't
  > have a hundred hard registers).
Yup.  Hence my message about throttling the scheduler once the ratio of
pseudos to hard registers hits some magic number.


  > Another approach I've been thinking about is to add code that analyzes code
  > like this after reload
  > 
  > move mem1 => hardreg1
  > move hardreg1 => mem2
  > move mem3 => hardreg1
  > move hardreg1 => mem4
  > move mem5 => hardreg1
  > move hardreg1 => mem6
  > 
  > and tries to make it use as many independent hard registers as possible.
  > That would make the scheduling opportunities available without the risk
  > of over-scheduling before reload. I don't know how feasible this is.
Yes, I've considered doing similar things myself, but I didn't think it
was really worth the effort..

Jeff
Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
  1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
  1997-08-19  8:08 ` Bernd Schmidt
@ 1997-08-19  8:08 ` Bernd Schmidt
  2 siblings, 0 replies; 33+ messages in thread
From: Bernd Schmidt @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

> Before this leaves my head, I wanted to point something out which
> you've reminded me of.  When the scheduler (this applies to both the
> original and Haifa versions equally) becomes aggressive, it produces a
> large number of reloads in certain situations.  

The idea of running sched before reload seems to be to improve code like this:
move mem1 => pseudo1
move pseudo1 => mem2
move mem3 => pseudo2
move pseudo2 => mem4
move mem5 => pseudo3
move pseudo3 => mem6

If this is left as it stands, register allocation will most likely allocate
the pseudo to the same hard register. This means the post-reload sched pass
can't do anything with it, and the CPU can't either because there is no
parallelism in the code (well, at least the Pentium can't).

If sched modifies the above to look like this

move mem1 => pseudo1
move mem3 => pseudo2
move mem5 => pseudo3
move pseudo1 => mem2
move pseudo2 => mem4
move pseudo3 => mem6

you suddenly have two blocks of three independent instructions which could
run in parallel. However, this will lose badly once you don't have three
instructions of that kind, but a hundred (since your average CPU doesn't
have a hundred hard registers).

Another approach I've been thinking about is to add code that analyzes code
like this after reload

move mem1 => hardreg1
move hardreg1 => mem2
move mem3 => hardreg1
move hardreg1 => mem4
move mem5 => hardreg1
move hardreg1 => mem6

and tries to make it use as many independent hard registers as possible.
That would make the scheduling opportunities available without the risk
of over-scheduling before reload. I don't know how feasible this is.

Bernd

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Reload patch to improve 386 code
@ 1997-08-18 20:47 Mike Meissner
  0 siblings, 0 replies; 33+ messages in thread
From: Mike Meissner @ 1997-08-18 20:47 UTC (permalink / raw)
  To: egcs

Bernd, I just checked at the FSF, and did not notice a copyright
assignment from you.  Could you do that so that we can consider using
your code?  If are employed as a programmer, particularly if you
developed the patch on company time, you need to get somebody who can
sign for the company (such as a vice president or director) to
disclaim all copyright claims to the code.  Here is the sample from
the COPYING file, you would obviously alter the names.

  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
  `Gnomovision' (which makes passes at compilers) written by James Hacker.

  <signature of Ty Coon>, 1 April 1989
  Ty Coon, President of Vice

You would then write a similar copyright disclaimer, giving the
changes to the FSF.

-- 
Michael Meissner, Cygnus Solutions (East Coast)
4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA
meissner@cygnus.com,	617-354-5416 (office),	617-354-7161 (fax)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-18 20:46 Joern Rennecke
  0 siblings, 0 replies; 33+ messages in thread
From: Joern Rennecke @ 1997-08-18 20:46 UTC (permalink / raw)
  To: egcs

> 
>   In message <199708181855.OAA03711@jenolan.rutgers.edu>you write:
>   > Before this leaves my head, I wanted to point something out which
>   > you've reminded me of.  When the scheduler (this applies to both the
>   > original and Haifa versions equally) becomes aggressive, it produces a
>   > large number of reloads in certain situations.  Reloads which would
>   > not have happened if scheduling did not take place.  This happens
>   > especially if register pressure is high already.  I noticed this
>   > particularly on RISC platforms, seems in this case the more registers
>   > available the worse things became when the register usage was
>   > saturated.
> Yup.  Absolutely.  This is a known issue with schedulers in general;
> you can see this in action if you look at fpppp in spec.
> 
> The haifa scheduler has some code to "prefer" schedules which don't
> lengthen register lifetimes, but I don't know how effective it really
> is.  The normal scheduler has some code to do this too, but it's less
> effective than haifa.
> 
> However, I've done experiments and the more I make the scheduler
> try to reduce register lifetimes, the worse _overall_ performance I
> see.
> 
> It almost makes me think such code should be dependent on the number
> of registers actually in the code -- ie once the ratio of pseudos
> to hard registers hits some magic point, the code to shorten lifetimes
> kicks in and tries to keep the scheduler from extending register
> lifetimes.

I think we could get better results if the scheduler just does its work,
but leaves enough information for reload so that reload can do un-do some
aggressive scheduling if this saves some reloads and these savings
outweigh the penalty for sub-optimal scheduling.

In a similar vein, it would be useful if regmove could pass some information
of the transformation it has done down to reload.  If some value ends up in
a stack slot, some of the regmove transformation would better be undone.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-18 18:55 David S. Miller
@ 1997-08-18 19:09 ` Jeffrey A Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1997-08-18 19:09 UTC (permalink / raw)
  To: egcs

  In message <199708181855.OAA03711@jenolan.rutgers.edu>you write:
  > Before this leaves my head, I wanted to point something out which
  > you've reminded me of.  When the scheduler (this applies to both the
  > original and Haifa versions equally) becomes aggressive, it produces a
  > large number of reloads in certain situations.  Reloads which would
  > not have happened if scheduling did not take place.  This happens
  > especially if register pressure is high already.  I noticed this
  > particularly on RISC platforms, seems in this case the more registers
  > available the worse things became when the register usage was
  > saturated.
Yup.  Absolutely.  This is a known issue with schedulers in general;
you can see this in action if you look at fpppp in spec.

The haifa scheduler has some code to "prefer" schedules which don't
lengthen register lifetimes, but I don't know how effective it really
is.  The normal scheduler has some code to do this too, but it's less
effective than haifa.

However, I've done experiments and the more I make the scheduler
try to reduce register lifetimes, the worse _overall_ performance I
see.

It almost makes me think such code should be dependent on the number
of registers actually in the code -- ie once the ratio of pseudos
to hard registers hits some magic point, the code to shorten lifetimes
kicks in and tries to keep the scheduler from extending register
lifetimes.

jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-18 18:55 David S. Miller
  1997-08-18 19:09 ` Jeffrey A Law
  0 siblings, 1 reply; 33+ messages in thread
From: David S. Miller @ 1997-08-18 18:55 UTC (permalink / raw)
  To: egcs

   Date: Mon, 18 Aug 1997 11:17:38 -0400 (EDT)
   From: meissner@cygnus.com

   I think it is an horrible kludge that reload is run as a pass after
   global-alloc, and that it forces reload registers not to be used
   for any other purpose (which is murder on the x86 with each
   register being special purpose in some way).

Before this leaves my head, I wanted to point something out which
you've reminded me of.  When the scheduler (this applies to both the
original and Haifa versions equally) becomes aggressive, it produces a
large number of reloads in certain situations.  Reloads which would
not have happened if scheduling did not take place.  This happens
especially if register pressure is high already.  I noticed this
particularly on RISC platforms, seems in this case the more registers
available the worse things became when the register usage was
saturated.

Another thing I saw gcc do for a register constrained function was the
following.  Consider a function which references a lot of global data
structures, which it iterates over as a 3 level indirected table.  It
also references 2 or 3 global pieces of data in a few distinct places
within the iteration (the specific example is specifically the
function mm/memory.c:copy_page_range() on sparc64 in the Linux kernel)
A rough outline is:

inline copy_one_pte() {
	if(pte_none(pte))
		return;
	if(!pte_present(pte)) {
		duplicate_swap_information();
		return;
	}
	copy_pte_info_and_maybe_cow();
}

inline copy_pte_range() {
	for_all_ptes_in_this_pmd()
		copy_one_pte();
}

inline copy_pmd_range() {
	for_all_pmds_in_this_pgd()
		copy_pte_range();
}

copy_page_range() {
	for_all_pgds_in_this_range()
		copy_pmd_range();
}

Now, I have two global physical addresses I use for "null" page
tables, which is what newly allocated page tables point to before they
are set to point to anything specific.  There is heavy inlining done
here and on sparc64 it can end up allocating 300 or 400 bytes of stack
space.  Anyways, back to my main point, the two physical addresses for
the null page table pages live in two global variables
"null_pte_table" and "null_pmd_table".  The progression in the
produced code wrt. these two variables is:

copy_page_range:

	sethi	%hi(null_pte_table), %r
	or	%r, %lo(null_pte_table), %r
	stx	%r, [%sp + slot1]
	sethi	%hi(null_pmd_table), %r
	or	%r, %lo(null_pmd_table), %r
	stx	%r, [%sp + slot2]

...

main_loop:
	...
	/* we need to use null_pmd_table's value here */
	ldx	[%sp + slot1], %r
	ldx	[%r], %r2
	use it
	stx	%r2, [%sp + slot3]
	...
	/* we need to use null_pte_table's value here */
	ldx	[%sp + slot2], %r
	ldx	[%r], %r2
	use it
	stx	%r2, [%sp + %slot4]	
	...
	/* need again to use null_pmd_table's value */
	ldx	[%sp + slot3], %r
	use it
	...
	/* need again to use null_pte_table's value */
	ldx	[%sp + slot4], %r
	use it
	...
	b	main_loop

Anyways you get the picture, I see often these "gradual" reloads,
where final values are computed partially, given a stack slot for
reloading purposes, and then loaded back and computed further, then
given yet another stack slot.  On RISC systems this can account for a
significant portion of the stack slots taken up for a given function.

   Now, given it is a rather critical piece of the compiler, it may
   take months if not years to get it better than it currently is.

Well, perhaps it would be prudent to have people start now in the
background to get this work going?

Later,
David "Sparc" Miller
davem@caip.rutgers.edu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-18 15:36 Toon Moene
  0 siblings, 0 replies; 33+ messages in thread
From: Toon Moene @ 1997-08-18 15:36 UTC (permalink / raw)
  To: egcs

  In message <9708181258.AA19853@moene.indiv.nluug.nl> I wrote:

  > Isn't this something that would have been solved [ coniunctivus  
  > irrealis ! ] by live range splitting ?

Jeff:

>  Not entirely.  live range splitting should significantly
>  reduce the number of spills, but it won't always be able
>  to completely eliminate them (in theory it can completely
>  eliminate them, but in practice it doesn't).  Thus you
>  still need to deal with spilling registers.

I tried to reply to the following:

[ Bernd Schmidt: ]

>  it is
>  possible to use register life information to determine
>  that certain pseudos are not live across any instructions
>  that need reloads, and thus need not be spilled.

i.e. the _effect_ that live range splitting has on spilling; I know  
it cannot eliminate it.

Cheers,
Toon.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-18 15:11 meissner
  0 siblings, 0 replies; 33+ messages in thread
From: meissner @ 1997-08-18 15:11 UTC (permalink / raw)
  To: egcs

|   In message <9708181258.AA19853@moene.indiv.nluug.nl>you write:
|   > Isn't this something that would have been solved [ coniunctivus  
|   > irrealis ! ] by live range splitting ?
| Not entirely.  live range splitting should significantly reduce
| the number of spills, but it won't always be able to completely
| eliminate them (in theory it can completely eliminate them, but
| in practice it doesn't).  Thus you still need to deal with spilling
| registers.

Yes, I think it is obvious that you will always need to spill registers.  Also,
live range splitting as I'm currently implementing it will still be generating
spills.  I do think the previous posting on register spills is also helpful and
plan to look at it when I get in the office (I didn't mention to disparage it,
my first posting was made pre-coffee).  My comment on reload is meant for more
reload as we currently know it.  I think it is an horrible kludge that reload
is run as a pass after global-alloc, and that it forces reload registers not to
be used for any other purpose (which is murder on the x86 with each register
being special purpose in some way).  I think it should be integrated into
global-alloc, taking lifetimes, ranges, etc. into consideration, possibly
leaving the fossil reload for when you are not optimizing.  Given that reload
has pretty much been the place where we all fear to tread, at least since RMS
handed over the gcc2 reigns to Kenner (and even in the 1.3x time frame, I got
the sense that RMS no longer had a good handle on reload anymore), I think it
is time for a rethought.  Now, given it is a rather critical piece of the
compiler, it may take months if not years to get it better than it currently
is.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-18 14:53 Monday morning Philippe Laliberte
@ 1997-08-18 14:54 ` Jeffrey A Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeffrey A Law @ 1997-08-18 14:54 UTC (permalink / raw)
  To: egcs

  In message <9708181258.AA19853@moene.indiv.nluug.nl>you write:
  > Isn't this something that would have been solved [ coniunctivus  
  > irrealis ! ] by live range splitting ?
Not entirely.  live range splitting should significantly reduce
the number of spills, but it won't always be able to completely
eliminate them (in theory it can completely eliminate them, but
in practice it doesn't).  Thus you still need to deal with spilling
registers.

Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-18 14:53 meissner
  0 siblings, 0 replies; 33+ messages in thread
From: meissner @ 1997-08-18 14:53 UTC (permalink / raw)
  To: egcs

| >  I have come up with a patch for the reload pass that can
| >  improve the generated code on the i386 quite dramatically.
| >  The problem with reload is that when it decides to spill
| >  a hard register, it spills _all_ pseudos allocated to
| >  this hard register.  This is often not necessary, it is
| >  possible to use register life information to determine
| >  that certain pseudos are not live across any instructions
| >  that need reloads, and thus need not be spilled.
| 
| Isn't this something that would have been solved [ coniunctivus  
| irrealis ! ] by live range splitting ?
| 
| "pseudo's that are not live across instructions" sounds to me as  
| the definition of the place to split the liveness range.
| 
| Or am I missing something ?

My opinion is reload needs to be rewritten completely.  I'm in the middle of
working on a beginning of live range splitting right now, but it probably will
be some time before it moves to egcs (ie, its not ready yet).

| OTOH, "beter een vogel in de hand dan tien in de lucht" (sorry,  
| couldn't resist).
| 
| Cheers,
| Toon.
| 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Reload patch to improve 386 code
  1997-08-18  8:13 Test result for 970814 on native sparc-sun-solaris2.5.1 Klaus Kaempf
  1997-08-18  8:22 ` Reload patch to improve 386 code Bernd Schmidt
@ 1997-08-18 10:11 ` Bernd Schmidt
  1 sibling, 0 replies; 33+ messages in thread
From: Bernd Schmidt @ 1997-08-18 10:11 UTC (permalink / raw)
  To: egcs

I have come up with a patch for the reload pass that can improve the generated
code on the i386 quite dramatically.  The problem with reload is that when it
decides to spill a hard register, it spills _all_ pseudos allocated to this
hard register.  This is often not necessary, it is possible to use register
life information to determine that certain pseudos are not live across any
instructions that need reloads, and thus need not be spilled.

What the patch does (slightly simplified, for the full picture, read the
source):
  - record for each instruction which pseudos are live in which hard regs
    during the insn.
  - for each insn that needs reloads, record the needs, unless the insn
    requires groups. If the insn needs groups, the new code doesn't yet
    bother with trying to do a better job than the old code. The assumption
    is that most insns don't need groups.
  - Don't spill all the pseudos allocated to a spill register. Instead, for
    each instruction that needs reloads, see which subset of the spill
    registers can be used to satisfy the needs of the insn with the lowest
    cost. Usually, not all the spill registers are needed for a given insn.
    Then spill all pseudos which are allocated to the needed subset of spill
    registers and live across the insn that needs these spill registers.

I also disabled the code in local-alloc that used to allocate SCRATCHes.
Reload can handle that; I don't see a point in having duplicate code (and it's
slightly easier to do things right this way).

To measure the improvements in code quality, I built the Linux kernel (2.1.50)
both with an unmodified compiler and with a patched one. The patch reduced
the size of the text section by 28000 bytes.

Possible problems:
  - While caller-saves still generally work, I see no way to make the old code
    to handle caller-saves that need a spill register work. Currently,
    whenever reload notices that caller-saves would need a spill register, it
    spills all pseudos allocated to caller-saved registers.
  - reorg.c assumes that all registers used as spill registers are dead at
    a CODE_LABEL. This is wrong now, and I disabled the code.

Both of these problems should only affect code quality; I think the patch is
correct in these cases (although I couldn't test either). I don't know which
machines are affected by this, and I don't know if and how much code quality
suffers on these machines. I'd be happy about feedback.

The compiler passed "make bootstrap; make gnucompare", and the stage2
compiler passed c-torture without additional failures.

If someone can confirm that the advantages generally outweigh these potential
problems, I would like this to go into the compiler source. I've also
submitted it to Richard Kenner for inclusion in gcc2.

	* reload.h (struct insn_chain, reload_insn_chain, new_insn_chain):
	Declare.
	(save_call_clobbered_regs): Change enum machine_mode argument to int.
	* reload1.c (new_insn_chain, regno_becomes_live, reg_becomes_live,
	regno_dies, build_insn_chain, finish_spills, spill_pseudos,
	spill_caller_saved_regs, spill_sortfn): New functions.
	(reg_old_renumber, real_n_spills, pseudo_forbidden_regs,
	spilled_pseudos, non_spill_reload_regs, current_chain,
	spills_this_insn): New static variables.
	(reload_startobj, reload_insn_chain): New variables.
	(basic_block_needs): Delete variable.
	(forbidden_regs): Now static.
	(new_spill_register): Don't call spill_hard_reg.
	(spill_hard_reg): Don't actually spill any pseudos, instead set a bit
	in spilled_pseudos.  Don't try to reallocate pseudos.  Delete code to
	handle scratches on scratch_list.
	(reload): Delete code to handle basic_block_needs.  Delete code to
	handle caller-saves that need a spill reg.  When caller-saves would
	need a spill_register, call spill_caller_saved_regs.
	Call build_insn_chain.  Set up reload_firstobj and reload_startobj.
	Allocate space for reg_old_renumber, spilled_pseudos and
	pseudo_forbidden_regs when these will be used subsequently.
	Don't walk the rtl insns directly, use reload_insn_chain.
	Whenever pseudos may need to be spilled (i.e., after calling
	new_spill_reg or spill_hard_reg), call finish_spills.
	Copy n_spills to real_n_spills before calling finish_spills. Use
	real_n_spills to determine how many regs to put back into
	potential_reload_regs.
	Record needs for reloads and eliminations in the insn_chain
	structures, not as the insn mode.  Also record the needs for spill
	registers of all insns that don't need groups if life information is
	available.
	Clear pseudo_forbidden_regs and used_spill_regs before starting to
	select spill registers.
	Don't pass a mode to save_call_clobbered_regs, pass an integer that
	says whether eliminations are necessary.
	Delete code to compute used_spill_regs.
	Before returning, free spilled_pseudos and everything on
	reload_obstack.
	(reload_as_needed): Don't walk the rtl insns directly, use
	reload_insn_chain.  Delete code to check basic_block_needs.
	Pass live_known and the current insn_chain to choose_reload_regs.
	(choose_reload_regs): Add parameter global.  Don't take an rtx
	parameter for the insn, use an insn_chain structure.  
	When life information is available, use the information in the
	insn_chain structure to set in reload_regs_used all registers that
	may not be used.
	* local-alloc.c (local_alloc): Disable code to allocate SCRATCHes.
	* caller-save.c (restore_referenced_regs, insert_save_restore): Take
	a struct insn_chain * argument instead of rtx for the insn.  Take an
	int argument to determine whether eliminations are needed instead of
	an enum machine_mode argument.
	(restore_referenced_regs): Change calls to these functions.
	(save_call_clobbered_regs): Likewise.
	Walk the reload_insn_chain instead of the rtl insns.
	(insert_save_restore): Additionally to emitting a new insn, call
	new_insn_chain to obtain a new insn_chain structure, fill it in and
	put it into the right place.  Set need_elim field as appropriate.
	* reorg.c (find_dead_or_set_registers): Comment out code that relies
	on the now false assumption that reload regs are dead at CODE_LABELs.

*** ./reload1.c.orig-1	Sun Aug 17 10:43:08 1997
--- ./reload1.c	Sun Aug 17 21:56:15 1997
*************** Boston, MA 02111-1307, USA.  */
*** 57,62 ****
--- 57,64 ----
     available hard regs can be found.  Spilling can invalidate more
     insns, requiring additional need for reloads, so we must keep checking
     until the process stabilizes.
+    If we have register life information, it is often possible to avoid
+    spilling all the pseudos allocated to a reload register.
  
     For machines with different classes of registers, we must keep track
     of the register class needed for each reload, and make sure that
*************** static int *reg_max_ref_width;
*** 118,123 ****
--- 120,128 ----
     constant or memory slot.  */
  static rtx *reg_equiv_init;
  
+ /* Vector to remember old contents of reg_renumber before spilling.  */
+ static short *reg_old_renumber;
+ 
  /* During reload_as_needed, element N contains the last pseudo regno
     reloaded into the Nth reload register.  This vector is in parallel
     with spill_regs.  If that pseudo reg occupied more than one register,
*************** static rtx reg_reloaded_insn[FIRST_PSEUD
*** 134,139 ****
--- 139,148 ----
  /* Number of spill-regs so far; number of valid elements of spill_regs.  */
  static int n_spills;
  
+ /* Number of real spill-regs made; in optimizing compilation, we may put
+    some additional registers at the end of spill_regs.  */
+ static int real_n_spills;
+ 
  /* In parallel with spill_regs, contains REG rtx's for those regs.
     Holds the last rtx used for any given reg, or 0 if it has never
     been used for spilling yet.  This rtx is reused, provided it has
*************** static short spill_reg_order[FIRST_PSEUD
*** 154,160 ****
  /* This reg set indicates registers that may not be used for retrying global
     allocation.  The registers that may not be used include all spill registers
     and the frame pointer (if we are using one).  */
! HARD_REG_SET forbidden_regs;
  
  /* This reg set indicates registers that are not good for spill registers.
     They will not be used to complete groups of spill registers.  This includes
--- 163,174 ----
  /* This reg set indicates registers that may not be used for retrying global
     allocation.  The registers that may not be used include all spill registers
     and the frame pointer (if we are using one).  */
! static HARD_REG_SET forbidden_regs;
! 
! /* This vector of reg sets indicates, for each pseudo, which hard registers may
!    not be used for retrying global allocation, additionally to those in
!    forbidden_regs.  */
! static HARD_REG_SET *pseudo_forbidden_regs;
  
  /* This reg set indicates registers that are not good for spill registers.
     They will not be used to complete groups of spill registers.  This includes
*************** static short spill_regs[FIRST_PSEUDO_REG
*** 173,179 ****
  /* This reg set indicates those registers that have been used a spill
     registers.  This information is used in reorg.c, to help figure out
     what registers are live at any point.  It is assumed that all spill_regs
!    are dead at every CODE_LABEL.  */
  
  HARD_REG_SET used_spill_regs;
  
--- 187,194 ----
  /* This reg set indicates those registers that have been used a spill
     registers.  This information is used in reorg.c, to help figure out
     what registers are live at any point.  It is assumed that all spill_regs
!    are dead at every CODE_LABEL.  
!    @@@ That's a bug!  */
  
  HARD_REG_SET used_spill_regs;
  
*************** static rtx spill_stack_slot[FIRST_PSEUDO
*** 237,248 ****
  
  static int spill_stack_slot_width[FIRST_PSEUDO_REGISTER];
  
! /* Indexed by register class and basic block number, nonzero if there is
!    any need for a spill register of that class in that basic block.
!    The pointer is 0 if we did stupid allocation and don't know
!    the structure of basic blocks.  */
  
! char *basic_block_needs[N_REG_CLASSES];
  
  /* First uid used by insns created by reload in this function.
     Used in find_equiv_reg.  */
--- 252,260 ----
  
  static int spill_stack_slot_width[FIRST_PSEUDO_REGISTER];
  
! /* Record which pseudos needed to be spilled.  */
  
! static regset spilled_pseudos;
  
  /* First uid used by insns created by reload in this function.
     Used in find_equiv_reg.  */
*************** enum insn_code reload_out_optab[NUM_MACH
*** 283,288 ****
--- 295,301 ----
  
  struct obstack reload_obstack;
  char *reload_firstobj;
+ char *reload_startobj;
  
  #define obstack_chunk_alloc xmalloc
  #define obstack_chunk_free free
*************** extern rtx forced_labels;
*** 292,297 ****
--- 305,312 ----
  
  /* Allocation number table from global register allocation.  */
  extern int *reg_allocno;
+ 
+ struct insn_chain *reload_insn_chain;
  \f
  /* This structure is used to record information about register eliminations.
     Each array entry describes one possible way of eliminating a register
*************** static int modes_equiv_for_class_p	PROTO
*** 363,368 ****
--- 378,387 ----
  static void spill_failure		PROTO((rtx));
  static int new_spill_reg		PROTO((int, int, int *, int *, int,
  					       FILE *));
+ static int spill_sortfn			PROTO((void *, void *));
+ static void spill_pseudos		PROTO((struct insn_chain *));
+ static int finish_spills		PROTO((int, FILE *));
+ static void spill_caller_saved_regs	PROTO((void));
  static void delete_dead_insn		PROTO((rtx));
  static void alter_reg  			PROTO((int, int));
  static void mark_scratch_live		PROTO((rtx));
*************** static int reload_reg_free_before_p	PROT
*** 386,398 ****
  static int reload_reg_reaches_end_p	PROTO((int, int, enum reload_type));
  static int reloads_conflict 		PROTO((int, int));
  static int allocate_reload_reg		PROTO((int, rtx, int, int));
! static void choose_reload_regs		PROTO((rtx, rtx));
  static void merge_assigned_reloads	PROTO((rtx));
  static void emit_reload_insns		PROTO((rtx));
  static void delete_output_reload	PROTO((rtx, int, rtx));
  static void inc_for_reload		PROTO((rtx, rtx, int));
  static int constraint_accepts_reg_p	PROTO((char *, rtx));
  static int count_occurrences		PROTO((rtx, rtx));
  static void reload_cse_invalidate_regno	PROTO((int, enum machine_mode, int));
  static int reload_cse_mem_conflict_p	PROTO((rtx, rtx));
  static void reload_cse_invalidate_mem	PROTO((rtx));
--- 405,418 ----
  static int reload_reg_reaches_end_p	PROTO((int, int, enum reload_type));
  static int reloads_conflict 		PROTO((int, int));
  static int allocate_reload_reg		PROTO((int, rtx, int, int));
! static void choose_reload_regs		PROTO((struct insn_chain *, rtx, int));
  static void merge_assigned_reloads	PROTO((rtx));
  static void emit_reload_insns		PROTO((rtx));
  static void delete_output_reload	PROTO((rtx, int, rtx));
  static void inc_for_reload		PROTO((rtx, rtx, int));
  static int constraint_accepts_reg_p	PROTO((char *, rtx));
  static int count_occurrences		PROTO((rtx, rtx));
+ 
  static void reload_cse_invalidate_regno	PROTO((int, enum machine_mode, int));
  static int reload_cse_mem_conflict_p	PROTO((rtx, rtx));
  static void reload_cse_invalidate_mem	PROTO((rtx));
*************** init_reload ()
*** 451,457 ****
  
    /* Initialize obstack for our rtl allocation.  */
    gcc_obstack_init (&reload_obstack);
-   reload_firstobj = (char *) obstack_alloc (&reload_obstack, 0);
  
    /* Decide which register class should be used when reloading
       addresses.  If we are using SMALL_REGISTER_CLASSES, and any
--- 471,476 ----
*************** init_reload ()
*** 514,519 ****
--- 533,684 ----
  #endif /* SMALL_REGISTER_CLASSES */
  }
  
+ /* Allocate an empty insn_chain structure.  */
+ struct insn_chain *
+ new_insn_chain ()
+ {
+   struct insn_chain *c;
+ 
+   c = obstack_alloc (&reload_obstack, sizeof (struct insn_chain));
+   reload_firstobj = (char *) obstack_alloc (&reload_obstack, 0);
+   return c;
+ }
+ 
+ /* Used for communication between the following functions.  Holds the
+    current life information.  */
+ static int inverse_renumber[FIRST_PSEUDO_REGISTER];
+ 
+ /* Record in inverse_renumber that register REGNO became live.  */
+ static void
+ regno_becomes_live (regno)
+      int regno;
+ {
+   int hardregno, nregs;
+   if (regno < FIRST_PSEUDO_REGISTER)
+     hardregno = regno, nregs = 1, regno = -1;
+   else
+     {
+       hardregno = reg_renumber[regno];
+       if (hardregno < 0)
+ 	return;
+       nregs = HARD_REGNO_NREGS (hardregno, PSEUDO_REGNO_MODE (regno));
+     }
+ 
+   while (nregs--)
+     inverse_renumber[hardregno++] = regno;
+ }
+ 
+ /* Record in inverse_renumber that register REG became live.  This
+    is called via note_stores.  */
+ static void
+ reg_becomes_live (reg, setter)
+      rtx reg, setter;
+ {
+   if (GET_CODE (reg) == REG)
+     regno_becomes_live (REGNO (reg));
+ }
+ 
+ /* Record in inverse_renumber that register REGNO died.  */
+ static void
+ reg_dies (regno)
+      int regno;
+ {
+   int hardregno, nregs;
+   if (regno < FIRST_PSEUDO_REGISTER)
+     hardregno = regno, nregs = 1;
+   else
+     {
+       hardregno = reg_renumber[regno];
+       if (hardregno < 0)
+ 	return;
+       nregs = HARD_REGNO_NREGS (hardregno, PSEUDO_REGNO_MODE (regno));
+     }
+ 
+   while (nregs--)
+     inverse_renumber[hardregno++] = 0;
+ }
+ 
+ /* Walk the insns of the current function and build reload_insn_chain,
+    and record register life information if it is available.  */
+ static void
+ build_insn_chain (first, global)
+      rtx first;
+      int global;
+ {
+   struct insn_chain **p = &reload_insn_chain;
+   struct insn_chain *prev = 0;
+   int b = 0;
+ 
+   while (first)
+     {
+       struct insn_chain *c;
+       if (global && first == basic_block_head[b])
+ 	{
+ 	  int regno;
+ 	  bzero ((char *)inverse_renumber, sizeof inverse_renumber);
+ 	  EXECUTE_IF_SET_IN_REG_SET (basic_block_live_at_start[b],
+ 				     0, regno,
+ 				     {
+ 				       regno_becomes_live (regno);
+ 				     });
+ 	}
+ 
+       if (GET_CODE (first) != NOTE && GET_CODE (first) != BARRIER)
+ 	{
+ 	  c = new_insn_chain ();
+ 	  c->prev = prev;
+ 	  prev = c;
+ 	  *p = c;
+ 	  p = &c->next;
+ 	  c->insn = first;
+ 	  c->block = b;
+ 	  c->need_reload = 0;
+ 	  c->need_elim = 0;
+ 	  bcopy (inverse_renumber, c->inverse_renum_before, sizeof inverse_renumber);
+ 	}
+ 
+       if (GET_RTX_CLASS (GET_CODE (first)) == 'i' && global)
+ 	{
+ 	  rtx link;
+ 
+ 	  /* Mark the death of everything that dies in this instruction.  */
+ 
+ 	  for (link = REG_NOTES (first); link; link = XEXP (link, 1))
+ 	    if (REG_NOTE_KIND (link) == REG_DEAD
+ 		&& GET_CODE (XEXP (link, 0)) == REG)
+ 	      reg_dies (REGNO (XEXP (link, 0)), GET_MODE (XEXP (link, 0)));
+ 
+ 	  /* Mark everything born in this instruction as live.  */
+ 
+ 	  note_stores (PATTERN (first), reg_becomes_live);
+ 	}
+ 
+       /* Remember which registers are live at the end of the insn, before
+ 	 killing those with REG_UNUSED notes.  */
+       if (GET_CODE (first) != NOTE && GET_CODE (first) != BARRIER && global)
+ 	{
+ 	  bcopy (inverse_renumber, c->inverse_renum_after, sizeof inverse_renumber);
+ 	}
+ 
+       if (GET_RTX_CLASS (GET_CODE (first)) == 'i' && global)
+ 	{
+ 	  rtx link;
+ 
+ 	  /* Mark anything that is set in this insn and then unused as dying.  */
+ 
+ 	  for (link = REG_NOTES (first); link; link = XEXP (link, 1))
+ 	    if (REG_NOTE_KIND (link) == REG_UNUSED
+ 		&& GET_CODE (XEXP (link, 0)) == REG)
+ 	      reg_dies (REGNO (XEXP (link, 0)), GET_MODE (XEXP (link, 0)));
+ 	}
+ 
+       if (global && first == basic_block_end[b])
+ 	b++;
+       first = NEXT_INSN (first);
+     }
+   *p = 0;
+ }
+ 
  /* Main entry point for the reload pass.
  
     FIRST is the first insn of the function being compiled.
*************** reload (first, global, dumpfile)
*** 551,575 ****
    int something_changed;
    int something_needs_reloads;
    int something_needs_elimination;
-   int new_basic_block_needs;
    enum reg_class caller_save_spill_class = NO_REGS;
    int caller_save_group_size = 1;
  
    /* Nonzero means we couldn't get enough spill regs.  */
    int failure = 0;
  
-   /* The basic block number currently being processed for INSN.  */
-   int this_block;
- 
    /* Make sure even insns with volatile mem refs are recognizable.  */
    init_recog ();
  
    /* Enable find_equiv_reg to distinguish insns made by reload.  */
    reload_first_uid = get_max_uid ();
  
-   for (i = 0; i < N_REG_CLASSES; i++)
-     basic_block_needs[i] = 0;
- 
  #ifdef SECONDARY_MEMORY_NEEDED
    /* Initialize the secondary memory table.  */
    clear_secondary_mem ();
--- 716,739 ----
    int something_changed;
    int something_needs_reloads;
    int something_needs_elimination;
    enum reg_class caller_save_spill_class = NO_REGS;
    int caller_save_group_size = 1;
  
    /* Nonzero means we couldn't get enough spill regs.  */
    int failure = 0;
  
    /* Make sure even insns with volatile mem refs are recognizable.  */
    init_recog ();
  
+   reload_startobj = (char *) obstack_alloc (&reload_obstack, 0);
+   build_insn_chain (first, global);
+   reload_firstobj = (char *) obstack_alloc (&reload_obstack, 0);
+ 
+   spilled_pseudos = ALLOCA_REG_SET ();
+ 
    /* Enable find_equiv_reg to distinguish insns made by reload.  */
    reload_first_uid = get_max_uid ();
  
  #ifdef SECONDARY_MEMORY_NEEDED
    /* Initialize the secondary memory table.  */
    clear_secondary_mem ();
*************** reload (first, global, dumpfile)
*** 636,641 ****
--- 800,810 ----
    bzero ((char *) reg_max_ref_width, max_regno * sizeof (int));
    cannot_omit_stores = (char *) alloca (max_regno);
    bzero (cannot_omit_stores, max_regno);
+   reg_old_renumber = (short *) alloca (max_regno * sizeof (short));
+   bcopy (reg_renumber, reg_old_renumber, max_regno * sizeof (short));
+   pseudo_forbidden_regs
+     = (HARD_REG_SET *) alloca (max_regno * sizeof (HARD_REG_SET));
+   bzero ((char *) pseudo_forbidden_regs, max_regno * sizeof (HARD_REG_SET));
  
  #ifdef SMALL_REGISTER_CLASSES
    if (SMALL_REGISTER_CLASSES)
*************** reload (first, global, dumpfile)
*** 834,846 ****
    if (frame_pointer_needed)
      spill_hard_reg (HARD_FRAME_POINTER_REGNUM, global, dumpfile, 1);
  #endif
! 
!   if (global)
!     for (i = 0; i < N_REG_CLASSES; i++)
!       {
! 	basic_block_needs[i] = (char *) alloca (n_basic_blocks);
! 	bzero (basic_block_needs[i], n_basic_blocks);
!       }
  
    /* From now on, we need to emit any moves without making new pseudos.  */
    reload_in_progress = 1;
--- 1003,1009 ----
    if (frame_pointer_needed)
      spill_hard_reg (HARD_FRAME_POINTER_REGNUM, global, dumpfile, 1);
  #endif
!   finish_spills (global, dumpfile);
  
    /* From now on, we need to emit any moves without making new pseudos.  */
    reload_in_progress = 1;
*************** reload (first, global, dumpfile)
*** 860,865 ****
--- 1023,1029 ----
    something_needs_elimination = 0;
    while (something_changed)
      {
+       struct insn_chain *chain;
        rtx after_call = 0;
  
        /* For each class, number of reload regs needed in that class.
*************** reload (first, global, dumpfile)
*** 901,913 ****
        for (i = 0; i < N_REG_CLASSES; i++)
  	group_mode[i] = VOIDmode;
  
-       /* Keep track of which basic blocks are needing the reloads.  */
-       this_block = 0;
- 
-       /* Remember whether any element of basic_block_needs
- 	 changes from 0 to 1 in this pass.  */
-       new_basic_block_needs = 0;
- 
        /* Round size of stack frame to BIGGEST_ALIGNMENT.  This must be done
  	 here because the stack size may be a part of the offset computation
  	 for register elimination, and there might have been new stack slots
--- 1065,1070 ----
*************** reload (first, global, dumpfile)
*** 1022,1032 ****
        /* Compute the most additional registers needed by any instruction.
  	 Collect information separately for each class of regs.  */
  
!       for (insn = first; insn; insn = NEXT_INSN (insn))
  	{
! 	  if (global && this_block + 1 < n_basic_blocks
! 	      && insn == basic_block_head[this_block+1])
! 	    ++this_block;
  
  	  /* If this is a label, a JUMP_INSN, or has REG_NOTES (which
  	     might include REG_LABEL), we need to see what effects this
--- 1179,1188 ----
        /* Compute the most additional registers needed by any instruction.
  	 Collect information separately for each class of regs.  */
  
!       for (chain = reload_insn_chain; chain; chain = chain->next)
  	{
! 	  rtx insn = chain->insn;
! 	  int this_block = chain->block;
  
  	  /* If this is a label, a JUMP_INSN, or has REG_NOTES (which
  	     might include REG_LABEL), we need to see what effects this
*************** reload (first, global, dumpfile)
*** 1121,1143 ****
  			    spill_reg_order);
  
  	      /* Remember for later shortcuts which insns had any reloads or
! 		 register eliminations.
! 
! 		 One might think that it would be worthwhile to mark insns
! 		 that need register replacements but not reloads, but this is
! 		 not safe because find_reloads may do some manipulation of
! 		 the insn (such as swapping commutative operands), which would
! 		 be lost when we restore the old pattern after register
! 		 replacement.  So the actions of find_reloads must be redone in
! 		 subsequent passes or in reload_as_needed.
! 
! 		 However, it is safe to mark insns that need reloads
! 		 but not register replacement.  */
! 
! 	      PUT_MODE (insn, (did_elimination ? QImode
! 			       : n_reloads ? HImode
! 			       : GET_MODE (insn) == DImode ? DImode
! 			       : VOIDmode));
  
  	      /* Discard any register replacements done.  */
  	      if (did_elimination)
--- 1277,1285 ----
  			    spill_reg_order);
  
  	      /* Remember for later shortcuts which insns had any reloads or
! 		 register eliminations.  */
! 	      chain->need_elim = did_elimination;
! 	      chain->need_reload = n_reloads > 0;
  
  	      /* Discard any register replacements done.  */
  	      if (did_elimination)
*************** reload (first, global, dumpfile)
*** 1149,1161 ****
  		  something_needs_elimination = 1;
  		}
  
! 	      /* If this insn has no reloads, we need not do anything except
! 		 in the case of a CALL_INSN when we have caller-saves and
! 		 caller-save needs reloads.  */
  
! 	      if (n_reloads == 0
! 		  && ! (GET_CODE (insn) == CALL_INSN
! 			&& caller_save_spill_class != NO_REGS))
  		continue;
  
  	      something_needs_reloads = 1;
--- 1291,1299 ----
  		  something_needs_elimination = 1;
  		}
  
! 	      /* If this insn has no reloads, we need not do anything.  */
  
! 	      if (n_reloads == 0)
  		continue;
  
  	      something_needs_reloads = 1;
*************** reload (first, global, dumpfile)
*** 1183,1198 ****
  			  && ! reload_secondary_p[i]))
    		    continue;
  
- 		  /* Show that a reload register of this class is needed
- 		     in this basic block.  We do not use insn_needs and
- 		     insn_groups because they are overly conservative for
- 		     this purpose.  */
- 		  if (global && ! basic_block_needs[(int) class][this_block])
- 		    {
- 		      basic_block_needs[(int) class][this_block] = 1;
- 		      new_basic_block_needs = 1;
- 		    }
- 
  		  mode = reload_inmode[i];
  		  if (GET_MODE_SIZE (reload_outmode[i]) > GET_MODE_SIZE (mode))
  		    mode = reload_outmode[i];
--- 1321,1326 ----
*************** reload (first, global, dumpfile)
*** 1392,1464 ****
  			    insn_needs.other_addr.groups[i]);
  		}
  
- 	      /* If this is a CALL_INSN and caller-saves will need
- 		 a spill register, act as if the spill register is
- 		 needed for this insn.   However, the spill register
- 		 can be used by any reload of this insn, so we only
- 		 need do something if no need for that class has
- 		 been recorded.
- 
- 		 The assumption that every CALL_INSN will trigger a
- 		 caller-save is highly conservative, however, the number
- 		 of cases where caller-saves will need a spill register but
- 		 a block containing a CALL_INSN won't need a spill register
- 		 of that class should be quite rare.
- 
- 		 If a group is needed, the size and mode of the group will
- 		 have been set up at the beginning of this loop.  */
- 
- 	      if (GET_CODE (insn) == CALL_INSN
- 		  && caller_save_spill_class != NO_REGS)
- 		{
- 		  /* See if this register would conflict with any reload
- 		     that needs a group.  */
- 		  int nongroup_need = 0;
- 		  int *caller_save_needs;
- 
- 		  for (j = 0; j < n_reloads; j++)
- 		    if ((CLASS_MAX_NREGS (reload_reg_class[j],
- 					  (GET_MODE_SIZE (reload_outmode[j])
- 					   > GET_MODE_SIZE (reload_inmode[j]))
- 					  ? reload_outmode[j]
- 					  : reload_inmode[j])
- 			 > 1)
- 			&& reg_classes_intersect_p (caller_save_spill_class,
- 						    reload_reg_class[j]))
- 		      {
- 			nongroup_need = 1;
- 			break;
- 		      }
- 
- 		  caller_save_needs 
- 		    = (caller_save_group_size > 1
- 		       ? insn_needs.other.groups
- 		       : insn_needs.other.regs[nongroup_need]); 
- 
- 		  if (caller_save_needs[(int) caller_save_spill_class] == 0)
- 		    {
- 		      register enum reg_class *p
- 			= reg_class_superclasses[(int) caller_save_spill_class];
- 
- 		      caller_save_needs[(int) caller_save_spill_class]++;
- 
- 		      while (*p != LIM_REG_CLASSES)
- 			caller_save_needs[(int) *p++] += 1;
- 		    }
- 
- 		  /* Show that this basic block will need a register of
-                    this class.  */
- 
- 		  if (global
- 		      && ! (basic_block_needs[(int) caller_save_spill_class]
- 			    [this_block]))
- 		    {
- 		      basic_block_needs[(int) caller_save_spill_class]
- 			[this_block] = 1;
- 		      new_basic_block_needs = 1;
- 		    }
- 		}
- 
  #ifdef SMALL_REGISTER_CLASSES
  	      /* If this insn stores the value of a function call,
  		 and that value is in a register that has been spilled,
--- 1520,1525 ----
*************** reload (first, global, dumpfile)
*** 1530,1535 ****
--- 1591,1612 ----
  		}
  #endif /* SMALL_REGISTER_CLASSES */
  
+ 	      chain->needs_valid = global != 0;
+ 	      if (global)
+ 		{
+ 		  /* Record the needs of this insn if it did not need groups.  */
+ 		  for (i = 0; i < N_REG_CLASSES; i++)
+ 		    {
+ 		      chain->needs[i] = insn_needs.other.regs[0][i];
+ 		      if (insn_needs.other.groups[i] > 0 
+ 			  || insn_needs.other.regs[1][i] > 0)
+ 			{
+ 			  chain->needs_valid = 0;
+ 			  break;
+ 			}
+ 		    }
+ 		}
+ 
  	      /* For each class, collect maximum need of any insn.  */
  
  	      for (i = 0; i < N_REG_CLASSES; i++)
*************** reload (first, global, dumpfile)
*** 1590,1607 ****
  	       ep++)
  	    ep->previous_offset = ep->max_offset;
  
! 	  if ( ! setup_save_areas (&something_changed)
! 	      && caller_save_spill_class  == NO_REGS)
  	    {
! 	      /* The class we will need depends on whether the machine
! 		 supports the sum of two registers for an address; see
! 	      find_address_reloads for details.  */
! 
! 	      caller_save_spill_class
! 		= double_reg_address_ok ? INDEX_REG_CLASS : BASE_REG_CLASS;
! 	      caller_save_group_size
! 		= CLASS_MAX_NREGS (caller_save_spill_class, Pmode);
  	      something_changed = 1;
  	    }
  	}
  
--- 1667,1677 ----
  	       ep++)
  	    ep->previous_offset = ep->max_offset;
  
! 	  if (! setup_save_areas (&something_changed))
  	    {
! 	      spill_caller_saved_regs ();
  	      something_changed = 1;
+ 	      caller_save_needed = 0;
  	    }
  	}
  
*************** reload (first, global, dumpfile)
*** 1689,1695 ****
        for (i = 0; i < N_REG_CLASSES; i++)
  	if (max_needs[i] > 0 || max_groups[i] > 0 || max_nongroups[i] > 0)
  	  break;
!       if (i == N_REG_CLASSES && !new_basic_block_needs && ! something_changed)
  	break;
  
        /* Not all needs are met; must spill some hard regs.  */
--- 1759,1765 ----
        for (i = 0; i < N_REG_CLASSES; i++)
  	if (max_needs[i] > 0 || max_groups[i] > 0 || max_nongroups[i] > 0)
  	  break;
!       if (i == N_REG_CLASSES && ! something_changed)
  	break;
  
        /* Not all needs are met; must spill some hard regs.  */
*************** reload (first, global, dumpfile)
*** 1721,1732 ****
  	    if (potential_reload_regs[i] != -1)
  	      potential_reload_regs[j--] = potential_reload_regs[i];
  
! 	  for (i = 0; i < n_spills; i++)
  	    {
  	      potential_reload_regs[i] = spill_regs[i];
  	      spill_reg_order[spill_regs[i]] = -1;
  	      CLEAR_HARD_REG_BIT (forbidden_regs, spill_regs[i]);
  	    }
  
  	  n_spills = 0;
  	}
--- 1791,1804 ----
  	    if (potential_reload_regs[i] != -1)
  	      potential_reload_regs[j--] = potential_reload_regs[i];
  
! 	  for (i = 0; i < real_n_spills; i++)
  	    {
  	      potential_reload_regs[i] = spill_regs[i];
  	      spill_reg_order[spill_regs[i]] = -1;
  	      CLEAR_HARD_REG_BIT (forbidden_regs, spill_regs[i]);
  	    }
+ 	  bzero ((char *) pseudo_forbidden_regs, 
+ 		 max_regno * sizeof (HARD_REG_SET));
  
  	  n_spills = 0;
  	}
*************** reload (first, global, dumpfile)
*** 1755,1760 ****
--- 1827,1833 ----
  
        CLEAR_HARD_REG_SET (counted_for_groups);
        CLEAR_HARD_REG_SET (counted_for_nongroups);
+       CLEAR_HARD_REG_SET (used_spill_regs);
  
        for (class = 0; class < N_REG_CLASSES; class++)
  	{
*************** reload (first, global, dumpfile)
*** 2032,2037 ****
--- 2105,2112 ----
  				    global, dumpfile);
  	    }
  	}
+       real_n_spills = n_spills;
+       something_changed |= finish_spills (global, dumpfile);
      }
  
    /* If global-alloc was run, notify it of any register eliminations we have
*************** reload (first, global, dumpfile)
*** 2042,2054 ****
  	mark_elimination (ep->from, ep->to);
  
    /* Insert code to save and restore call-clobbered hard regs
!      around calls.  Tell if what mode to use so that we will process
!      those insns in reload_as_needed if we have to.  */
  
    if (caller_save_needed)
!     save_call_clobbered_regs (num_eliminable ? QImode
! 			      : caller_save_spill_class != NO_REGS ? HImode
! 			      : VOIDmode);
  
    /* If a pseudo has no hard reg, delete the insns that made the equivalence.
       If that insn didn't set the register (i.e., it copied the register to
--- 2117,2126 ----
  	mark_elimination (ep->from, ep->to);
  
    /* Insert code to save and restore call-clobbered hard regs
!      around calls.  */
  
    if (caller_save_needed)
!     save_call_clobbered_regs (num_eliminable != 0);
  
    /* If a pseudo has no hard reg, delete the insns that made the equivalence.
       If that insn didn't set the register (i.e., it copied the register to
*************** reload (first, global, dumpfile)
*** 2177,2182 ****
--- 2249,2256 ----
    if (real_at_ptr)
      free (real_at_ptr);
  
+   FREE_REG_SET (spilled_pseudos);
+ 
    if (scratch_list)
      free (scratch_list);
    scratch_list = 0;
*************** reload (first, global, dumpfile)
*** 2184,2193 ****
      free (scratch_block);
    scratch_block = 0;
  
!   CLEAR_HARD_REG_SET (used_spill_regs);
!   for (i = 0; i < n_spills; i++)
!     SET_HARD_REG_BIT (used_spill_regs, spill_regs[i]);
! 
    return failure;
  }
  \f
--- 2258,2264 ----
      free (scratch_block);
    scratch_block = 0;
  
!   obstack_free (&reload_obstack, reload_startobj);
    return failure;
  }
  \f
*************** spill_failure (insn)
*** 2358,2365 ****
      fatal_insn ("Unable to find a register to spill.", insn);
  }
  
! /* Add a new register to the tables of available spill-registers
!     (as well as spilling all pseudos allocated to the register).
     I is the index of this register in potential_reload_regs.
     CLASS is the regclass whose need is being satisfied.
     MAX_NEEDS and MAX_NONGROUPS are the vectors of needs,
--- 2429,2435 ----
      fatal_insn ("Unable to find a register to spill.", insn);
  }
  
! /* Add a new register to the tables of available spill-registers.
     I is the index of this register in potential_reload_regs.
     CLASS is the regclass whose need is being satisfied.
     MAX_NEEDS and MAX_NONGROUPS are the vectors of needs,
*************** new_spill_reg (i, class, max_needs, max_
*** 2377,2383 ****
       FILE *dumpfile;
  {
    register enum reg_class *p;
!   int val;
    int regno = potential_reload_regs[i];
  
    if (i >= FIRST_PSEUDO_REGISTER)
--- 2447,2453 ----
       FILE *dumpfile;
  {
    register enum reg_class *p;
!   int val = 0;
    int regno = potential_reload_regs[i];
  
    if (i >= FIRST_PSEUDO_REGISTER)
*************** statements or clauses.");
*** 2394,2400 ****
    spill_regs[n_spills] = regno;
    spill_reg_order[regno] = n_spills;
    if (dumpfile)
!     fprintf (dumpfile, "Spilling reg %d.\n", spill_regs[n_spills]);
  
    /* Clear off the needs we just satisfied.  */
  
--- 2464,2471 ----
    spill_regs[n_spills] = regno;
    spill_reg_order[regno] = n_spills;
    if (dumpfile)
!     fprintf (dumpfile, "Spilling reg %d.\n", regno);
!   SET_HARD_REG_BIT (used_spill_regs, regno);
  
    /* Clear off the needs we just satisfied.  */
  
*************** statements or clauses.");
*** 2412,2422 ****
  	max_nongroups[(int) *p++]--;
      }
  
-   /* Spill every pseudo reg that was allocated to this reg
-      or to something that overlaps this reg.  */
- 
-   val = spill_hard_reg (spill_regs[n_spills], global, dumpfile, 0);
- 
    /* If there are some registers still to eliminate and this register
       wasn't ever used before, additional stack space may have to be
       allocated to store this register.  Thus, we may have changed the offset
--- 2483,2488 ----
*************** spill_hard_reg (regno, global, dumpfile,
*** 3674,3744 ****
  	    + HARD_REGNO_NREGS (reg_renumber[i],
  				PSEUDO_REGNO_MODE (i))
  	    > regno))
!       {
! 	/* If this register belongs solely to a basic block which needed no
! 	   spilling of any class that this register is contained in,
! 	   leave it be, unless we are spilling this register because
! 	   it was a hard register that can't be eliminated.   */
  
! 	if (! cant_eliminate
! 	    && basic_block_needs[0]
! 	    && REG_BASIC_BLOCK (i) >= 0
! 	    && basic_block_needs[(int) class][REG_BASIC_BLOCK (i)] == 0)
! 	  {
! 	    enum reg_class *p;
  
! 	    for (p = reg_class_superclasses[(int) class];
! 		 *p != LIM_REG_CLASSES; p++)
! 	      if (basic_block_needs[(int) *p][REG_BASIC_BLOCK (i)] > 0)
! 		break;
  
! 	    if (*p == LIM_REG_CLASSES)
! 	      continue;
! 	  }
  
  	/* Mark it as no longer having a hard register home.  */
  	reg_renumber[i] = -1;
  	/* We will need to scan everything again.  */
  	something_changed = 1;
! 	if (global)
! 	  retry_global_alloc (i, forbidden_regs);
  
! 	alter_reg (i, regno);
! 	if (dumpfile)
  	  {
! 	    if (reg_renumber[i] == -1)
! 	      fprintf (dumpfile, " Register %d now on stack.\n\n", i);
! 	    else
! 	      fprintf (dumpfile, " Register %d now in %d.\n\n",
! 		       i, reg_renumber[i]);
  	  }
        }
!   for (i = 0; i < scratch_list_length; i++)
      {
!       if (scratch_list[i] && REGNO (scratch_list[i]) == regno)
  	{
! 	  if (! cant_eliminate && basic_block_needs[0]
! 	      && ! basic_block_needs[(int) class][scratch_block[i]])
! 	    {
! 	      enum reg_class *p;
! 
! 	      for (p = reg_class_superclasses[(int) class];
! 		   *p != LIM_REG_CLASSES; p++)
! 		if (basic_block_needs[(int) *p][scratch_block[i]] > 0)
! 		  break;
! 
! 	      if (*p == LIM_REG_CLASSES)
! 		continue;
! 	    }
! 	  PUT_CODE (scratch_list[i], SCRATCH);
! 	  scratch_list[i] = 0;
! 	  something_changed = 1;
! 	  continue;
  	}
      }
  
    return something_changed;
  }
  \f
  /* Find all paradoxical subregs within X and update reg_max_ref_width. 
     Also mark any hard registers used to store user variables as
--- 3740,4102 ----
  	    + HARD_REGNO_NREGS (reg_renumber[i],
  				PSEUDO_REGNO_MODE (i))
  	    > regno))
!       SET_REGNO_REG_SET (spilled_pseudos, i);
  
!   return something_changed;
! }
  
! /* When we notice that caller-saves would need a spill register, kick out all
!    pseudos which are in caller-saved hard registers and live across calls.  */
! static void
! spill_caller_saved_regs ()
! {
!   int i;
  
!   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
!     {
!       int regno = reg_renumber[i];
!       int nregs;
!       if (regno < 0 || REG_N_CALLS_CROSSED (i) > 0)
! 	continue;
!       nregs = HARD_REGNO_NREGS (regno, PSEUDO_REGNO_MODE (i));
!       while (nregs-- > 0)
! 	if (call_used_regs[regno++])
! 	  SET_REGNO_REG_SET (spilled_pseudos, i);
!     }
!   /* We no longer want caller-saves if retry_global_alloc is run.  */
!   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
!     if (call_used_regs[i])
!       SET_HARD_REG_BIT (losing_caller_save_reg_set, i);
! }
! 
! /* Set of registers which were used as a reload register, but are not
!    mentioned in spill_regs yet.  */
! static HARD_REG_SET non_spill_reload_regs;
! 
! /* Things that should be local variables in spill_pseudos, but aren't since
!    spill_sortfn needs them.  */
! static struct insn_chain *current_chain;
! static short spills_this_insn[FIRST_PSEUDO_REGISTER];
! 
! /* Called by qsort, to sort the spills_this_insn array in order of decreasing
!    preference for using a register as a reload register.  */
! static int
! spill_sortfn (a1, b1)
!      void *a1, *b1;
! {
!   short *a = (short *)a1, *b = (short *)b1;
!   int areg = *a, breg = *b;
!   int a_unallocated = (current_chain->inverse_renum_before[areg] == 0
! 		       && current_chain->inverse_renum_after[areg] == 0);
!   int b_unallocated = (current_chain->inverse_renum_before[breg] == 0
! 		       && current_chain->inverse_renum_after[breg] == 0);
!   if (a_unallocated && b_unallocated)
!     return areg < breg ? -1 : 1;
!   if (a_unallocated)
!     return -1;
!   if (b_unallocated)
!     return 1;
! 
!   /* Make the sort stable if both regs are allocated to a pseudo.  The 
!      original spill_regs is sorted by decreasing preference, and we want
!      to keep that order in this case.  */
!   return a-b;
! }
! 
! /* Given an insn needing reloading in CHAIN, find the best set of pseudos
!    to spill to satisfy its needs.  This function is used only in optimizing
!    compilation.
!    We consider even hard regs which do not appear in spill_regs as candidates
!    for reload regs.  The idea is that spill_regs determines a minimal set of
!    hard regs which will satisfy all the needs, and will cause minimal damage
!    when spilled.  However, most insns will need fewer reload regs.  For some
!    insns, some of the selected spill registers are not holding pseudos and are
!    therefore cheaper to use.  Also, some of the hard registers which have not
!    been marked as spill registers may be free in some insns that need reloads,
!    and it makes to use them instead of one of the spill registers, since that
!    avoids a spill.
!    When we use a hard register that is not in spill_regs yet, we must make
!    sure that it is eventually recorded there to prevent choose_reload_regs
!    from getting confused.  For that reason, such hard regs are entered into
!    non_spill_reload_regs.  */
! static void
! spill_pseudos (chain)
!      struct insn_chain *chain;
! {
!   int i;
!   short spills_this_insn[FIRST_PSEUDO_REGISTER];
!   int this_insn_n_spills;
! 
!   CLEAR_HARD_REG_SET (chain->spill_regs_available);
! 
!   /* If we can't tell exactly what the insn needs, spill all the
!      pseudos allocated to one of the spill regs within this insn.  */
!   if (! chain->needs_valid)
!     {
!       int i;
!       for (i = 0; i < n_spills; i++)
! 	{
! 	  int regno = chain->inverse_renum_before[spill_regs[i]];
! 	  if (regno > 0)
! 	    SET_REGNO_REG_SET (spilled_pseudos, regno);
! 	  regno = chain->inverse_renum_after[spill_regs[i]];
! 	  if (regno > 0)
! 	    SET_REGNO_REG_SET (spilled_pseudos, regno);
! 	  SET_HARD_REG_BIT (chain->spill_regs_available, spill_regs[i]);
! 	}
!       return;
!     }
! 
!   /* Make a copy of spill_regs and sort it in order of decreasing
!      preference for this insn.  */ 
!   bcopy (spill_regs, spills_this_insn, sizeof spills_this_insn);
!   
!   /* We may be able to use some more registers.  */
!   this_insn_n_spills = n_spills;
!   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
!     {
!       if (! fixed_regs[i]
! 	  && ! TEST_HARD_REG_BIT (forbidden_regs, i)
! 	  && ! TEST_HARD_REG_BIT (used_spill_regs, i)
! 	  && chain->inverse_renum_before[i] == 0
! 	  && chain->inverse_renum_after[i] == 0)
! 	spills_this_insn[this_insn_n_spills++] = i;
!     }
! 
!   current_chain = chain;
!   qsort (spills_this_insn, this_insn_n_spills, sizeof *spills_this_insn,
! 	 spill_sortfn);
!   
!   /* Now select as many spill regs as are needed for the insn.  */
! 
!   /* For each register class, find the best reload register to use.
!      Handle the register classes in ascending order.  */
!   for (i = 0; i < N_REG_CLASSES; i++)
!     {
!       int j;
!       if (chain->needs[i] == 0)
! 	continue;
! 
!       for (j = 0; j < this_insn_n_spills; j++)
! 	{
! 	  enum reg_class *p;
! 	  int spilled;
! 	  int regno = spills_this_insn[j];
! 
! 	  /* If this register was already spilled or does not fit
! 	     the class, ignore it.  */
! 	  if (! TEST_HARD_REG_BIT (reg_class_contents[i], regno)
! 	      || TEST_HARD_REG_BIT (chain->spill_regs_available, regno))
! 	    continue;
! 
! 	  /* Spill the register, and decrease the needs of this insn.  */
! 	  spilled = chain->inverse_renum_before[regno];
! 	  if (spilled > 0)
! 	    SET_REGNO_REG_SET (spilled_pseudos, spilled);
! 	  spilled = chain->inverse_renum_after[regno];
! 	  if (spilled > 0)
! 	    SET_REGNO_REG_SET (spilled_pseudos, spilled);
! 	  SET_HARD_REG_BIT (chain->spill_regs_available, regno);
! 
! 	  if (! TEST_HARD_REG_BIT (used_spill_regs, regno))
! 	    SET_HARD_REG_BIT (non_spill_reload_regs, regno);
! 
! 	  chain->needs[i]--;
! 	  p = reg_class_superclasses[(int) i];
! 	  while (*p != LIM_REG_CLASSES)
! 	    chain->needs[(int) *p++]--;
! 
! 	  if (chain->needs[i] == 0)
! 	    break;
! 	}
!     }
! }
! 
! /* After some hard regs have been marked as spill registers by new_spill_reg,
!    this function figures out which pseudos need to be spilled.
!    In non-optimizing compilation, every pseudo allocated to one of the
!    spill registers is spilled.  In optimizing compilation, we can be more
!    clever.  We know which insns need reloads, we know what registers are
!    live across them, and we usually know exactly what kinds of spill
!    registers they need, so we can spill a minimal set of pseudo regs to
!    satisfy the needs.  
!    Return nonzero if something changed and another iteration is necessary.  */
! static int
! finish_spills (global, dumpfile)
!      int global;
!      FILE *dumpfile;
! {
!   struct insn_chain *chain;
!   int something_changed = 0;
!   int i;
  
+   CLEAR_HARD_REG_SET (non_spill_reload_regs);
+ 
+   /* Find out which pseudos need to be spilled from the spill_regs array.  */
+   if (! global || n_spills == 0)
+     {
+       for (i = 0; i < n_spills; i++)
+ 	spill_hard_reg (spill_regs[i], 0, dumpfile, 0);
+     }
+   else
+     {
+       /* For every insn that needs reloads, satisfy its needs.  */
+       for (chain = reload_insn_chain; chain; chain = chain->next)
+ 	{
+ 	  if (! chain->need_reload)
+ 	    continue;
+ 	  spill_pseudos (chain);
+ 	}
+ 
+       /* For every insn that needs reloads, set the registers used as spill
+ 	 regs in pseudo_forbidden_regs for every pseudo live across the
+ 	 insn.  */
+       for (chain = reload_insn_chain; chain; chain = chain->next)
+ 	{
+ 	  rtx insn = chain->insn;
+ 	  if (! chain->need_reload)
+ 	    continue;
+ 	  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+ 	    {
+ 	      int j;
+ 	      int regno = chain->inverse_renum_before[i];
+ 	      if (regno > 0)
+ 		{
+ 		  if (chain->needs_valid)
+ 		    IOR_HARD_REG_SET (pseudo_forbidden_regs[regno],
+ 				      chain->spill_regs_available);
+ 		  else
+ 		    for (j = 0; j < n_spills; j++)
+ 		      SET_HARD_REG_BIT (pseudo_forbidden_regs[regno],
+ 					spill_regs[j]);
+ 		}
+ 
+ 	      regno = chain->inverse_renum_after[i];
+ 	      if (regno > 0)
+ 		{
+ 		  if (chain->needs_valid)
+ 		    IOR_HARD_REG_SET (pseudo_forbidden_regs[regno],
+ 				      chain->spill_regs_available);
+ 		  else
+ 		    for (j = 0; j < n_spills; j++)
+ 		      SET_HARD_REG_BIT (pseudo_forbidden_regs[regno],
+ 					spill_regs[j]);
+ 		}
+ 	    }
+ 	}
+     }
+ 
+       
+   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+     if (REGNO_REG_SET_P (spilled_pseudos, i))
+       {
  	/* Mark it as no longer having a hard register home.  */
  	reg_renumber[i] = -1;
  	/* We will need to scan everything again.  */
  	something_changed = 1;
!       }
  
!   /* Retry allocating the spilled pseudos.  */
!   if (global)
!     for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
!       if (reg_old_renumber[i] != reg_renumber[i])
! 	{
! 	  HARD_REG_SET forbidden;
! 	  COPY_HARD_REG_SET (forbidden, forbidden_regs);
! 	  IOR_HARD_REG_SET (forbidden, pseudo_forbidden_regs[i]);
! 	  retry_global_alloc (i, forbidden);
! 	}
!   
!   /* Fix up the register information in the insn chain.  */
!   if (global)
!     for (chain = reload_insn_chain; chain; chain = chain->next)
!       {
! 	regset live_before = ALLOCA_REG_SET();
! 	regset live_after = ALLOCA_REG_SET();
! 
! 	/* Figure out which off the spilled pseudos are live before and
! 	   after the insn.  Delete the old mappings in the inverse_renum
! 	   arrays.  */
! 	for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
  	  {
! 	    int regno = chain->inverse_renum_after[i];
! 	    if (regno >= FIRST_PSEUDO_REGISTER
! 		&& REGNO_REG_SET_P (spilled_pseudos, regno))
! 	      {
! 		chain->inverse_renum_after[i] = 0;
! 		SET_REGNO_REG_SET (live_after, regno);
! 	      }
! 
! 	    regno = chain->inverse_renum_before[i];
! 	    if (regno >= FIRST_PSEUDO_REGISTER
! 		&& REGNO_REG_SET_P (spilled_pseudos, regno))
! 	      {
! 		chain->inverse_renum_before[i] = 0;
! 		SET_REGNO_REG_SET (live_before, regno);
! 	      }
! 	  }
! 	/* Now set up the new mappings in inverse_renum.  */
! 	for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
! 	  {
! 	    int regno = reg_renumber[i];
! 	    if (regno < 0)
! 	      continue;
! 	    if (REGNO_REG_SET_P (live_before, i))
! 	      {
! 		int nregs = HARD_REGNO_NREGS (regno, PSEUDO_REGNO_MODE (i));
! 		while (nregs-- > 0)
! 		  chain->inverse_renum_before[regno++] = i;
! 	      }
! 	    if (REGNO_REG_SET_P (live_after, i))
! 	      {
! 		int nregs = HARD_REGNO_NREGS (regno, PSEUDO_REGNO_MODE (i));
! 		while (nregs-- > 0)
! 		  chain->inverse_renum_after[regno++] = i;
! 	      }
  	  }
+ 	/* Mark any unallocated hard regs as available for spills.  That
+ 	   makes inheritance work somewhat better.  */
+ 	for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+ 	  if (! fixed_regs[i]
+ 	      && ! TEST_HARD_REG_BIT (forbidden_regs, i)
+ 	      && chain->inverse_renum_before[i] == 0
+ 	      && chain->inverse_renum_after[i] == 0)
+ 	    SET_HARD_REG_BIT (chain->spill_regs_available, i);
+ 	FREE_REG_SET (live_before);
+ 	FREE_REG_SET (live_after);
        }
! 
!   /* Let alter_reg modify the reg rtx's for the modified pseudos.  */
!   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
      {
!       int regno = reg_renumber[i];
!       if (reg_old_renumber[i] == regno)
! 	continue;
!       
!       alter_reg (i, reg_old_renumber[i]);
!       reg_old_renumber[i] = regno;
!       if (dumpfile)
  	{
! 	  if (regno == -1)
! 	    fprintf (dumpfile, " Register %d now on stack.\n\n", i);
! 	  else
! 	    fprintf (dumpfile, " Register %d now in %d.\n\n",
! 		     i, reg_renumber[i]);
  	}
      }
  
+   /* We may need to fill up the spill_regs array so it contains all hard
+      regs.  */
+   if (global)
+     for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+       if (TEST_HARD_REG_BIT (non_spill_reload_regs, i))
+ 	spill_regs[n_spills++] = i;
+       
+   /* Show no pseudos were spilled for the next iteration.  */
+   CLEAR_REG_SET (spilled_pseudos);
    return something_changed;
  }
+ 
  \f
  /* Find all paradoxical subregs within X and update reg_max_ref_width. 
     Also mark any hard registers used to store user variables as
*************** reload_as_needed (first, live_known)
*** 3958,3966 ****
       rtx first;
       int live_known;
  {
!   register rtx insn;
    register int i;
-   int this_block = 0;
    rtx x;
    rtx after_call = 0;
  
--- 4316,4323 ----
       rtx first;
       int live_known;
  {
!   struct insn_chain *chain;
    register int i;
    rtx x;
    rtx after_call = 0;
  
*************** reload_as_needed (first, live_known)
*** 4001,4014 ****
  	spill_reg_order[spill_regs[i]] = i;
      }
  
!   for (insn = first; insn;)
      {
!       register rtx next = NEXT_INSN (insn);
! 
!       /* Notice when we move to a new basic block.  */
!       if (live_known && this_block + 1 < n_basic_blocks
! 	  && insn == basic_block_head[this_block+1])
! 	++this_block;
  
        /* If we pass a label, copy the offsets from the label information
  	 into the current offsets of each elimination.  */
--- 4358,4368 ----
  	spill_reg_order[spill_regs[i]] = i;
      }
  
!   for (chain = reload_insn_chain; chain; chain = chain->next)
      {
!       rtx insn = chain->insn;
!       rtx old_next = NEXT_INSN (insn);
!       int this_block = chain->block;
  
        /* If we pass a label, copy the offsets from the label information
  	 into the current offsets of each elimination.  */
*************** reload_as_needed (first, live_known)
*** 4068,4084 ****
  
  	  /* If we need to do register elimination processing, do so.
  	     This might delete the insn, in which case we are done.  */
! 	  if (num_eliminable && GET_MODE (insn) == QImode)
  	    {
  	      eliminate_regs_in_insn (insn, 1);
  	      if (GET_CODE (insn) == NOTE)
! 		{
! 		  insn = next;
! 		  continue;
! 		}
  	    }
  
! 	  if (GET_MODE (insn) == VOIDmode)
  	    n_reloads = 0;
  	  /* First find the pseudo regs that must be reloaded for this insn.
  	     This info is returned in the tables reload_... (see reload.h).
--- 4422,4435 ----
  
  	  /* If we need to do register elimination processing, do so.
  	     This might delete the insn, in which case we are done.  */
! 	  if (num_eliminable && chain->need_elim)
  	    {
  	      eliminate_regs_in_insn (insn, 1);
  	      if (GET_CODE (insn) == NOTE)
! 		continue;
  	    }
  
! 	  if (! chain->need_elim && ! chain->need_reload)
  	    n_reloads = 0;
  	  /* First find the pseudo regs that must be reloaded for this insn.
  	     This info is returned in the tables reload_... (see reload.h).
*************** reload_as_needed (first, live_known)
*** 4099,4124 ****
  	      rtx p;
  	      int class;
  
- 	      /* If this block has not had spilling done for a
- 		 particular clas and we have any non-optionals that need a
- 		 spill reg in that class, abort.  */
- 
- 	      for (class = 0; class < N_REG_CLASSES; class++)
- 		if (basic_block_needs[class] != 0
- 		    && basic_block_needs[class][this_block] == 0)
- 		  for (i = 0; i < n_reloads; i++)
- 		    if (class == (int) reload_reg_class[i]
- 			&& reload_reg_rtx[i] == 0
- 			&& ! reload_optional[i]
- 			&& (reload_in[i] != 0 || reload_out[i] != 0
- 			    || reload_secondary_p[i] != 0))
- 		      fatal_insn ("Non-optional registers need a spill register", insn);
- 
  	      /* Now compute which reload regs to reload them into.  Perhaps
  		 reusing reload regs from previous insns, or else output
  		 load insns to reload them.  Maybe output store insns too.
  		 Record the choices of reload reg in reload_reg_rtx.  */
! 	      choose_reload_regs (insn, avoid_return_reg);
  
  #ifdef SMALL_REGISTER_CLASSES
  	      /* Merge any reloads that we didn't combine for fear of 
--- 4450,4460 ----
  	      rtx p;
  	      int class;
  
  	      /* Now compute which reload regs to reload them into.  Perhaps
  		 reusing reload regs from previous insns, or else output
  		 load insns to reload them.  Maybe output store insns too.
  		 Record the choices of reload reg in reload_reg_rtx.  */
! 	      choose_reload_regs (chain, avoid_return_reg, live_known);
  
  #ifdef SMALL_REGISTER_CLASSES
  	      /* Merge any reloads that we didn't combine for fear of 
*************** reload_as_needed (first, live_known)
*** 4166,4172 ****
  
  	  /* There may have been CLOBBER insns placed after INSN.  So scan
  	     between INSN and NEXT and use them to forget old reloads.  */
! 	  for (x = NEXT_INSN (insn); x != next; x = NEXT_INSN (x))
  	    if (GET_CODE (x) == INSN && GET_CODE (PATTERN (x)) == CLOBBER)
  	      note_stores (PATTERN (x), forget_old_reloads_1);
  
--- 4502,4508 ----
  
  	  /* There may have been CLOBBER insns placed after INSN.  So scan
  	     between INSN and NEXT and use them to forget old reloads.  */
! 	  for (x = NEXT_INSN (insn); x != old_next; x = NEXT_INSN (x))
  	    if (GET_CODE (x) == INSN && GET_CODE (PATTERN (x)) == CLOBBER)
  	      note_stores (PATTERN (x), forget_old_reloads_1);
  
*************** reload_as_needed (first, live_known)
*** 4219,4226 ****
  	  }
  #endif
  
-       insn = next;
- 
  #ifdef USE_C_ALLOCA
        alloca (0);
  #endif
--- 4555,4560 ----
*************** allocate_reload_reg (r, insn, last_reloa
*** 5210,5223 ****
     finding a reload reg in the proper class.  */
  
  static void
! choose_reload_regs (insn, avoid_return_reg)
!      rtx insn;
       rtx avoid_return_reg;
  {
    register int i, j;
    int max_group_size = 1;
    enum reg_class group_class = NO_REGS;
    int inheritance;
  
    rtx save_reload_reg_rtx[MAX_RELOADS];
    char save_reload_inherited[MAX_RELOADS];
--- 5544,5559 ----
     finding a reload reg in the proper class.  */
  
  static void
! choose_reload_regs (chain, avoid_return_reg, global)
!      struct insn_chain *chain;
       rtx avoid_return_reg;
+      int global;
  {
    register int i, j;
    int max_group_size = 1;
    enum reg_class group_class = NO_REGS;
    int inheritance;
+   rtx insn = chain->insn;
  
    rtx save_reload_reg_rtx[MAX_RELOADS];
    char save_reload_inherited[MAX_RELOADS];
*************** choose_reload_regs (insn, avoid_return_r
*** 5258,5263 ****
--- 5594,5602 ----
        CLEAR_HARD_REG_SET (reload_reg_used_in_outaddr_addr[i]);
      }
  
+   if (global)
+     IOR_COMPL_HARD_REG_SET (reload_reg_used, chain->spill_regs_available);
+   
  #ifdef SMALL_REGISTER_CLASSES
    /* Don't bother with avoiding the return reg
       if we have no mandatory reload that could use it.  */
*** ./reload.h.orig-1	Sun Aug  3 10:58:29 1997
--- ./reload.h	Sun Aug 17 18:17:22 1997
*************** extern enum insn_code reload_in_optab[];
*** 130,135 ****
--- 130,178 ----
  extern enum insn_code reload_out_optab[];
  #endif
  
+ #ifdef SET_HARD_REG_BIT
+ /* This structure describes instructions which are relevant for reload.  */
+ struct insn_chain 
+ {
+   struct insn_chain *next, *prev;
+ 
+   /* The basic block this insn is in.  */
+   int block;
+   /* The rtx of the insn.  */
+   rtx insn;
+   /* Register life information: For each hard register, contains either
+      -1 if the hard register is live, 0 if it is not live, or a pseudo
+      register number if a pseudo that has been allocated to this hard
+      reg is live.  This information is recorded for the point immediately
+      before the insn (in inverse_renum_before), and for the point within
+      the insn at which all outputs have just been written to (in
+      inverse_renum_after).  */
+   int inverse_renum_before[FIRST_PSEUDO_REGISTER];
+   int inverse_renum_after[FIRST_PSEUDO_REGISTER];
+ 
+   /* Show which hard registers may be used as reload registers for this
+      insn.  Calculated in finish_spills.  */
+   HARD_REG_SET spill_regs_available;
+   /* Show the needs of this insn.  */
+   char needs[N_REG_CLASSES];
+   /* Nonzero if find_reloads said the insn requires reloading.  */
+   unsigned int need_reload:1;
+   /* Nonzero if eliminate_regs_in_insn said it requires eliminations.  */
+   unsigned int need_elim:1;
+   /* Nonzero if needs is valid.  If the insn requires groups, we do not
+      bother yet to record the needs; instead we use the old strategy of
+      spilling all pseudos allocated to spill registers.  */
+   unsigned int needs_valid:1;
+ };
+ 
+ /* A chain of insn_chain structures to describe all non-note insns in
+    a function.  */
+ extern struct insn_chain *reload_insn_chain;
+ 
+ /* Allocate a new insn_chain structure.  */
+ extern struct insn_chain *new_insn_chain PROTO((void));
+ #endif
+ 
  /* Functions from reload.c:  */
  
  /* Return a memory location that will be used to copy X in mode MODE.  
*************** extern void init_save_areas PROTO((void)
*** 239,242 ****
  extern int setup_save_areas PROTO((int *));
  
  /* Find the places where hard regs are live across calls and save them.  */
! extern void save_call_clobbered_regs PROTO((enum machine_mode));
--- 282,285 ----
  extern int setup_save_areas PROTO((int *));
  
  /* Find the places where hard regs are live across calls and save them.  */
! extern void save_call_clobbered_regs PROTO((int));
*** ./local-alloc.c.orig-1	Mon Aug 11 10:30:35 1997
--- ./local-alloc.c	Sun Aug 17 12:13:41 1997
*************** local_alloc ()
*** 421,427 ****
       reload, only allocate space for MAX_QTY SCRATCHes.  If there are more
       reload will allocate them.  */
  
!   scratch_list_length = max_qty;
    scratch_list = (rtx *) xmalloc (scratch_list_length * sizeof (rtx));
    bzero ((char *) scratch_list, scratch_list_length * sizeof (rtx));
    scratch_block = (int *) xmalloc (scratch_list_length * sizeof (int));
--- 421,427 ----
       reload, only allocate space for MAX_QTY SCRATCHes.  If there are more
       reload will allocate them.  */
  
!   scratch_list_length = 0;
    scratch_list = (rtx *) xmalloc (scratch_list_length * sizeof (rtx));
    bzero ((char *) scratch_list, scratch_list_length * sizeof (rtx));
    scratch_block = (int *) xmalloc (scratch_list_length * sizeof (int));
*** ./caller-save.c.orig-1	Sun Aug  3 11:04:20 1997
--- ./caller-save.c	Sun Aug 17 12:13:41 1997
*************** int n_regs_saved;
*** 80,88 ****
  
  static void set_reg_live		PROTO((rtx, rtx));
  static void clear_reg_live		PROTO((rtx));
! static void restore_referenced_regs	PROTO((rtx, rtx, enum machine_mode));
! static int insert_save_restore		PROTO((rtx, int, int,
! 					       enum machine_mode, int));
  \f
  /* Initialize for caller-save.
  
--- 80,89 ----
  
  static void set_reg_live		PROTO((rtx, rtx));
  static void clear_reg_live		PROTO((rtx));
! static void restore_referenced_regs	PROTO((rtx, struct insn_chain *,
! 					       int));
! static int insert_save_restore		PROTO((struct insn_chain *, int, int,
! 					       int, int));
  \f
  /* Initialize for caller-save.
  
*************** setup_save_areas (pchanged)
*** 342,499 ****
  \f
  /* Find the places where hard regs are live across calls and save them.
  
!    INSN_MODE is the mode to assign to any insns that we add.  This is used
     by reload to determine whether or not reloads or register eliminations
     need be done on these insns.  */
  
  void
! save_call_clobbered_regs (insn_mode)
!      enum machine_mode insn_mode;
  {
!   rtx insn;
    int b;
  
!   for (b = 0; b < n_basic_blocks; b++)
!     {
!       regset regs_live = basic_block_live_at_start[b];
!       rtx prev_block_last = PREV_INSN (basic_block_head[b]);
        int i, j;
        int regno;
  
!       /* Compute hard regs live at start of block -- this is the
! 	 real hard regs marked live, plus live pseudo regs that
! 	 have been renumbered to hard regs.  No registers have yet been
! 	 saved because we restore all of them before the end of the basic
! 	 block.  */
! 
!       REG_SET_TO_HARD_REG_SET (hard_regs_live, regs_live);
!       CLEAR_HARD_REG_SET (hard_regs_saved);
!       CLEAR_HARD_REG_SET (hard_regs_need_restore);
!       n_regs_saved = 0;
! 
!       EXECUTE_IF_SET_IN_REG_SET (regs_live, 0, i,
! 				 {
! 				   if ((regno = reg_renumber[i]) >= 0)
! 				     for (j = regno;
! 					  j < regno + HARD_REGNO_NREGS (regno,
! 									PSEUDO_REGNO_MODE (i));
! 					  j++)
! 				       SET_HARD_REG_BIT (hard_regs_live, j);
! 				 });
  
!       /* Now scan the insns in the block, keeping track of what hard
! 	 regs are live as we go.  When we see a call, save the live
! 	 call-clobbered hard regs.  */
  
!       for (insn = basic_block_head[b]; ; insn = NEXT_INSN (insn))
  	{
! 	  RTX_CODE code = GET_CODE (insn);
  
! 	  if (GET_RTX_CLASS (code) == 'i')
! 	    {
! 	      rtx link;
  
! 	      /* If some registers have been saved, see if INSN references
! 		 any of them.  We must restore them before the insn if so.  */
  
! 	      if (n_regs_saved)
! 		restore_referenced_regs (PATTERN (insn), insn, insn_mode);
  
! 	      /* NB: the normal procedure is to first enliven any
! 		 registers set by insn, then deaden any registers that
! 		 had their last use at insn.  This is incorrect now,
! 		 since multiple pseudos may have been mapped to the
! 		 same hard reg, and the death notes are ambiguous.  So
! 		 it must be done in the other, safe, order.  */
  
! 	      for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
! 		if (REG_NOTE_KIND (link) == REG_DEAD)
! 		  clear_reg_live (XEXP (link, 0));
  
! 	      /* When we reach a call, we need to save all registers that are
! 		 live, call-used, not fixed, and not already saved.  We must
! 		 test at this point because registers that die in a CALL_INSN
! 		 are not live across the call and likewise for registers that
! 		 are born in the CALL_INSN.
! 		 
! 		 If registers are filled with parameters for this function,
! 		 and some of these are also being set by this function, then
! 		 they will not appear to die (no REG_DEAD note for them),
! 		 to check if in fact they do, collect the set registers in
! 		 hard_regs_live first.  */
  
! 	      if (code == CALL_INSN)
! 		{
! 		  HARD_REG_SET this_call_sets;
! 		  {
! 		    HARD_REG_SET old_hard_regs_live;
  
! 		    /* Save the hard_regs_live information.  */
! 		    COPY_HARD_REG_SET (old_hard_regs_live, hard_regs_live);
  
! 		    /* Now calculate hard_regs_live for this CALL_INSN
! 		       only.  */
! 		    CLEAR_HARD_REG_SET (hard_regs_live);
! 		    note_stores (PATTERN (insn), set_reg_live);
! 		    COPY_HARD_REG_SET (this_call_sets, hard_regs_live);
  
! 		    /* Restore the hard_regs_live information.  */
! 		    COPY_HARD_REG_SET (hard_regs_live, old_hard_regs_live);
! 		  }
  
! 		  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 		    if (call_used_regs[regno] && ! call_fixed_regs[regno]
! 		        && TEST_HARD_REG_BIT (hard_regs_live, regno)
! 			/* It must not be set by this instruction.  */
! 		        && ! TEST_HARD_REG_BIT (this_call_sets, regno)
! 		        && ! TEST_HARD_REG_BIT (hard_regs_saved, regno))
! 		      regno += insert_save_restore (insn, 1, regno, 
! 						    insn_mode, 0);
  
! 		  /* Put the information for this CALL_INSN on top of what
! 		     we already had.  */
! 		  IOR_HARD_REG_SET (hard_regs_live, this_call_sets);
! 		  COPY_HARD_REG_SET (hard_regs_need_restore, hard_regs_saved);
  
! 		  /* Must recompute n_regs_saved.  */
! 		  n_regs_saved = 0;
! 		  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 		    if (TEST_HARD_REG_BIT (hard_regs_saved, regno))
! 		      n_regs_saved++;
! 		}
! 	      else
! 		{
! 		  note_stores (PATTERN (insn), set_reg_live);
  #ifdef AUTO_INC_DEC
- 		  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
- 		    if (REG_NOTE_KIND (link) == REG_INC)
- 		      set_reg_live (XEXP (link, 0), NULL_RTX);
- #endif
- 		}
- 
  	      for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
! 		if (REG_NOTE_KIND (link) == REG_UNUSED)
! 		  clear_reg_live (XEXP (link, 0));
  	    }
! 
! 	  if (insn == basic_block_end[b])
! 	    break;
  	}
  
!       /* At the end of the basic block, we must restore any registers that
! 	 remain saved.  If the last insn in the block is a JUMP_INSN, put
! 	 the restore before the insn, otherwise, put it after the insn.  */
  
!       if (n_regs_saved)
! 	for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 	  if (TEST_HARD_REG_BIT (hard_regs_need_restore, regno))
! 	    regno += insert_save_restore ((GET_CODE (insn) == JUMP_INSN
! 				  ? insn : NEXT_INSN (insn)), 0,
! 				  regno, insn_mode, MOVE_MAX / UNITS_PER_WORD);
  
!       /* If we added any insns at the start of the block, update the start
! 	 of the block to point at those insns.  */
!       basic_block_head[b] = NEXT_INSN (prev_block_last);
      }
  }
  
--- 343,494 ----
  \f
  /* Find the places where hard regs are live across calls and save them.
  
!    NEED_ELIM is the mode to assign to any insns that we add.  This is used
     by reload to determine whether or not reloads or register eliminations
     need be done on these insns.  */
  
  void
! save_call_clobbered_regs (need_elim)
!      int need_elim;
  {
!   rtx prev_block_last;
!   struct insn_chain *chain;
    int b;
  
!   /* Now scan the insns in each block, keeping track of what hard
!      regs are live as we go.  When we see a call, save the live
!      call-clobbered hard regs.  */
! 
!   b = -1;
!   for (chain = reload_insn_chain; chain; chain = chain->next)
!     {      
        int i, j;
        int regno;
+       rtx insn = chain->insn;
+       enum rtx_code code = GET_CODE (insn);
+       
+       if (chain->block > b)
+ 	{
+ 	  b = chain->block;
+ 	  prev_block_last = PREV_INSN (basic_block_head[b]);
  
! 	  /* Compute hard regs live at start of block.  This information is
! 	     available in the insn chain.  */
  
! 	  CLEAR_HARD_REG_SET (hard_regs_live);
! 	  CLEAR_HARD_REG_SET (hard_regs_saved);
! 	  CLEAR_HARD_REG_SET (hard_regs_need_restore);
! 	  n_regs_saved = 0;
! 	  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
! 	    if (chain->inverse_renum_before[i] != 0)
! 	      SET_HARD_REG_BIT (hard_regs_live, i);
! 	}
  
!       if (GET_RTX_CLASS (code) == 'i')
  	{
! 	  rtx link;
  
! 	  /* If some registers have been saved, see if INSN references
! 	     any of them.  We must restore them before the insn if so.  */
  
! 	  if (n_regs_saved)
! 	    restore_referenced_regs (PATTERN (insn), chain, need_elim);
  
! 	  /* NB: the normal procedure is to first enliven any
! 	     registers set by insn, then deaden any registers that
! 	     had their last use at insn.  This is incorrect now,
! 	     since multiple pseudos may have been mapped to the
! 	     same hard reg, and the death notes are ambiguous.  So
! 	     it must be done in the other, safe, order.  */
  
! 	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
! 	    if (REG_NOTE_KIND (link) == REG_DEAD)
! 	      clear_reg_live (XEXP (link, 0));
  
! 	  /* When we reach a call, we need to save all registers that are
! 	     live, call-used, not fixed, and not already saved.  We must
! 	     test at this point because registers that die in a CALL_INSN
! 	     are not live across the call and likewise for registers that
! 	     are born in the CALL_INSN.
  
! 	     If registers are filled with parameters for this function,
! 	     and some of these are also being set by this function, then
! 	     they will not appear to die (no REG_DEAD note for them),
! 	     to check if in fact they do, collect the set registers in
! 	     hard_regs_live first.  */
  
! 	  if (code == CALL_INSN)
! 	    {
! 	      HARD_REG_SET this_call_sets;
! 	      {
! 		HARD_REG_SET old_hard_regs_live;
  
! 		/* Save the hard_regs_live information.  */
! 		COPY_HARD_REG_SET (old_hard_regs_live, hard_regs_live);
  
! 		/* Now calculate hard_regs_live for this CALL_INSN
! 		   only.  */
! 		CLEAR_HARD_REG_SET (hard_regs_live);
! 		note_stores (PATTERN (insn), set_reg_live);
! 		COPY_HARD_REG_SET (this_call_sets, hard_regs_live);
  
! 		/* Restore the hard_regs_live information.  */
! 		COPY_HARD_REG_SET (hard_regs_live, old_hard_regs_live);
! 	      }
  
! 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 		if (call_used_regs[regno] && ! call_fixed_regs[regno]
! 		    && TEST_HARD_REG_BIT (hard_regs_live, regno)
! 		    /* It must not be set by this instruction.  */
! 		    && ! TEST_HARD_REG_BIT (this_call_sets, regno)
! 		    && ! TEST_HARD_REG_BIT (hard_regs_saved, regno))
! 		  regno += insert_save_restore (chain, 1, regno, 
! 						need_elim, 0);
  
! 	      /* Put the information for this CALL_INSN on top of what
! 		 we already had.  */
! 	      IOR_HARD_REG_SET (hard_regs_live, this_call_sets);
! 	      COPY_HARD_REG_SET (hard_regs_need_restore, hard_regs_saved);
  
! 	      /* Must recompute n_regs_saved.  */
! 	      n_regs_saved = 0;
! 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 		if (TEST_HARD_REG_BIT (hard_regs_saved, regno))
! 		  n_regs_saved++;
! 	    }
! 	  else
! 	    {
! 	      note_stores (PATTERN (insn), set_reg_live);
  #ifdef AUTO_INC_DEC
  	      for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
! 		if (REG_NOTE_KIND (link) == REG_INC)
! 		  set_reg_live (XEXP (link, 0), NULL_RTX);
! #endif
  	    }
! 	  
! 	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
! 	    if (REG_NOTE_KIND (link) == REG_UNUSED)
! 	      clear_reg_live (XEXP (link, 0));
  	}
  
!       if (chain->next == 0 || chain->next->block > b)
! 	{
! 	  /* At the end of the basic block, we must restore any registers that
! 	     remain saved.  If the last insn in the block is a JUMP_INSN, put
! 	     the restore before the insn, otherwise, put it after the insn.  */
  
! 	  if (n_regs_saved)
! 	    for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 	      if (TEST_HARD_REG_BIT (hard_regs_need_restore, regno))
! 		regno += insert_save_restore ((GET_CODE (insn) == JUMP_INSN
! 					       ? chain : chain->next), 0,
! 					      regno, need_elim,
! 					      MOVE_MAX / UNITS_PER_WORD);
  
! 	  /* If we added any insns at the start of the block, update the start
! 	     of the block to point at those insns.  */
! 	  basic_block_head[b] = NEXT_INSN (prev_block_last);
! 	}
      }
  }
  
*************** clear_reg_live (reg)
*** 556,568 ****
  \f
  /* If any register currently residing in the save area is referenced in X,
     which is part of INSN, emit code to restore the register in front of INSN.
!    INSN_MODE is the mode to assign to any insns that we add.  */
  
  static void
! restore_referenced_regs (x, insn, insn_mode)
       rtx x;
!      rtx insn;
!      enum machine_mode insn_mode;
  {
    enum rtx_code code = GET_CODE (x);
    char *fmt;
--- 551,563 ----
  \f
  /* If any register currently residing in the save area is referenced in X,
     which is part of INSN, emit code to restore the register in front of INSN.
!    NEED_ELIM is the mode to assign to any insns that we add.  */
  
  static void
! restore_referenced_regs (x, chain, need_elim)
       rtx x;
!      struct insn_chain *chain;
!      int need_elim;
  {
    enum rtx_code code = GET_CODE (x);
    char *fmt;
*************** restore_referenced_regs (x, insn, insn_m
*** 581,591 ****
        if (regno >= FIRST_PSEUDO_REGISTER
  	  && reg_equiv_mem[regno] != 0)
  	restore_referenced_regs (XEXP (reg_equiv_mem[regno], 0),
! 				 insn, insn_mode);
        else if (regno >= FIRST_PSEUDO_REGISTER
  	       && reg_equiv_address[regno] != 0)
  	restore_referenced_regs (reg_equiv_address[regno],
! 				 insn, insn_mode);
  
        /* Otherwise if this is a hard register, restore any piece of it that
  	 is currently saved.  */
--- 576,586 ----
        if (regno >= FIRST_PSEUDO_REGISTER
  	  && reg_equiv_mem[regno] != 0)
  	restore_referenced_regs (XEXP (reg_equiv_mem[regno], 0),
! 				 chain, need_elim);
        else if (regno >= FIRST_PSEUDO_REGISTER
  	       && reg_equiv_address[regno] != 0)
  	restore_referenced_regs (reg_equiv_address[regno],
! 				 chain, need_elim);
  
        /* Otherwise if this is a hard register, restore any piece of it that
  	 is currently saved.  */
*************** restore_referenced_regs (x, insn, insn_m
*** 600,606 ****
  
  	  for (i = regno; i < endregno; i++)
  	    if (TEST_HARD_REG_BIT (hard_regs_need_restore, i))
! 	      i += insert_save_restore (insn, 0, i, insn_mode, saveregs);
  	}
  
        return;
--- 595,601 ----
  
  	  for (i = regno; i < endregno; i++)
  	    if (TEST_HARD_REG_BIT (hard_regs_need_restore, i))
! 	      i += insert_save_restore (chain, 0, i, need_elim, saveregs);
  	}
  
        return;
*************** restore_referenced_regs (x, insn, insn_m
*** 610,624 ****
    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
      {
        if (fmt[i] == 'e')
! 	restore_referenced_regs (XEXP (x, i), insn, insn_mode);
        else if (fmt[i] == 'E')
  	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
! 	  restore_referenced_regs (XVECEXP (x, i, j), insn, insn_mode);
      }
  }
  \f
  /* Insert a sequence of insns to save or restore, SAVE_P says which,
!    REGNO.  Place these insns in front of INSN.  INSN_MODE is the mode
     to assign to these insns.   MAXRESTORE is the maximum number of registers
     which should be restored during this call (when SAVE_P == 0).  It should
     never be less than 1 since we only work with entire registers.
--- 605,619 ----
    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
      {
        if (fmt[i] == 'e')
! 	restore_referenced_regs (XEXP (x, i), chain, need_elim);
        else if (fmt[i] == 'E')
  	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
! 	  restore_referenced_regs (XVECEXP (x, i, j), chain, need_elim);
      }
  }
  \f
  /* Insert a sequence of insns to save or restore, SAVE_P says which,
!    REGNO.  Place these insns in front of INSN.  NEED_ELIM is the mode
     to assign to these insns.   MAXRESTORE is the maximum number of registers
     which should be restored during this call (when SAVE_P == 0).  It should
     never be less than 1 since we only work with entire registers.
*************** restore_referenced_regs (x, insn, insn_m
*** 632,645 ****
     Return the extra number of registers saved.  */
  
  static int
! insert_save_restore (insn, save_p, regno, insn_mode, maxrestore)
!      rtx insn;
       int save_p;
       int regno;
!      enum machine_mode insn_mode;
       int maxrestore;
  {
    rtx pat;
    enum insn_code code;
    int i, numregs;
  
--- 627,642 ----
     Return the extra number of registers saved.  */
  
  static int
! insert_save_restore (chain, save_p, regno, need_elim, maxrestore)
!      struct insn_chain *chain;
       int save_p;
       int regno;
!      int need_elim;
       int maxrestore;
  {
+   struct insn_chain *new;
    rtx pat;
+   rtx insn = chain->insn;
    enum insn_code code;
    int i, numregs;
  
*************** insert_save_restore (insn, save_p, regno
*** 662,668 ****
  
    if ((GET_CODE (insn) == INSN || GET_CODE (insn) == JUMP_INSN)
        && reg_referenced_p (cc0_rtx, PATTERN (insn)))
!     insn = prev_nonnote_insn (insn);
  #endif
  
    /* Get the pattern to emit and update our status.  */
--- 659,665 ----
  
    if ((GET_CODE (insn) == INSN || GET_CODE (insn) == JUMP_INSN)
        && reg_referenced_p (cc0_rtx, PATTERN (insn)))
!     chain = chain->prev, insn = chain->insn;
  #endif
  
    /* Get the pattern to emit and update our status.  */
*************** insert_save_restore (insn, save_p, regno
*** 749,758 ****
          }
      }
    /* Emit the insn and set the code and mode.  */
! 
!   insn = emit_insn_before (pat, insn);
!   PUT_MODE (insn, insn_mode);
!   INSN_CODE (insn) = code;
  
    /* Tell our callers how many extra registers we saved/restored */
    return numregs - 1;
--- 746,761 ----
          }
      }
    /* Emit the insn and set the code and mode.  */
!   
!   new = new_insn_chain ();
!   new->prev = chain->prev;
!   new->prev->next = new;
!   chain->prev = new;
!   new->next = chain;
!   new->insn = emit_insn_before (pat, insn);
!   new->block = chain->block;
!   new->need_reload = 0;
!   new->need_elim = need_elim;
  
    /* Tell our callers how many extra registers we saved/restored */
    return numregs - 1;
*** ./reorg.c.orig-1	Sun Aug 17 16:22:16 1997
--- ./reorg.c	Sun Aug 17 16:22:35 1997
*************** find_dead_or_set_registers (target, res,
*** 2507,2512 ****
--- 2507,2513 ----
  	  AND_COMPL_HARD_REG_SET (res->regs, pending_dead_regs);
  	  CLEAR_HARD_REG_SET (pending_dead_regs);
  
+ #if 0 /* @@@ This no longer works now that reload is more clever.  */
  	  if (CODE_LABEL_NUMBER (insn) < max_label_num_after_reload)
  	    {
  	      /* All spill registers are dead at a label, so kill all of the
*************** find_dead_or_set_registers (target, res,
*** 2515,2520 ****
--- 2516,2522 ----
  	      AND_COMPL_HARD_REG_SET (scratch, needed.regs);
  	      AND_COMPL_HARD_REG_SET (res->regs, scratch);
  	    }
+ #endif
  	  continue;
  
  	case BARRIER:

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-18  8:13 Test result for 970814 on native sparc-sun-solaris2.5.1 Klaus Kaempf
@ 1997-08-18  8:22 ` Bernd Schmidt
  1997-08-18 10:11 ` Bernd Schmidt
  1 sibling, 0 replies; 33+ messages in thread
From: Bernd Schmidt @ 1997-08-18  8:22 UTC (permalink / raw)
  To: egcs

I should have mentioned the patch is against gcc2 snapshot 970802.

Bernd

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~1998-09-07 19:09 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-08-18 14:53 Reload patch to improve 386 code Toon Moene
     [not found] <Pine.SOL.3.90.970819095143.291G-100000@starsky.informatik.rwth-aachen.de>
1998-09-04  5:03 ` Jeffrey A Law
1998-09-04  5:05   ` Bernd Schmidt
1998-09-04 22:36     ` Joern Rennecke
1998-09-05  4:45     ` Jeffrey A Law
1998-09-05 15:49       ` Joern Rennecke
1998-09-05 15:44         ` Jeffrey A Law
1998-09-06  1:09           ` Mark Mitchell
1998-09-06  1:09             ` Jeffrey A Law
1998-09-07  9:48       ` Bernd Schmidt
1998-09-07 19:09         ` Joern Rennecke
1998-09-07 18:26           ` Jeffrey A Law
1998-09-07 19:09         ` Jeffrey A Law
1998-09-04 21:38   ` Joern Rennecke
1998-09-04 21:38     ` Jeffrey A Law
  -- strict thread matches above, loose matches on Subject: below --
1997-08-21 16:51 Problems on PowerPC David Edelsohn
1997-08-21 17:37 ` Reload patch to improve 386 code Bernd Schmidt
1997-08-21 16:51 Joern Rennecke
1997-08-21 15:20 egcs repository Joel Sherrill
1997-08-21 15:47 ` Reload patch to improve 386 code Bernd Schmidt
1997-08-19 19:00 Joern Rennecke
1997-08-19  8:50 Jakub Jelinek
1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
1997-08-19  8:08 ` Bernd Schmidt
1997-08-19  8:08 ` Bernd Schmidt
1997-08-18 20:47 Mike Meissner
1997-08-18 20:46 Joern Rennecke
1997-08-18 18:55 David S. Miller
1997-08-18 19:09 ` Jeffrey A Law
1997-08-18 15:36 Toon Moene
1997-08-18 15:11 meissner
1997-08-18 14:53 meissner
1997-08-18 14:53 Monday morning Philippe Laliberte
1997-08-18 14:54 ` Reload patch to improve 386 code Jeffrey A Law
1997-08-18  8:13 Test result for 970814 on native sparc-sun-solaris2.5.1 Klaus Kaempf
1997-08-18  8:22 ` Reload patch to improve 386 code Bernd Schmidt
1997-08-18 10:11 ` Bernd Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).