From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeffrey A Law <law@hurl.cygnus.com>
To: Zack Weinberg <zack@rabi.columbia.edu>
Cc: Joern Rennecke <amylaar@cygnus.co.uk>, egcs@egcs.cygnus.com
Subject: Re: a strange infelicity of register allocation 
Date: Thu, 28 Jan 1999 18:01:00 -0000
Message-id: <441.917575023@hurl.cygnus.com>
References: <199901290119.UAA29185@blastula.phys.columbia.edu>
X-SW-Source: 1999-01/msg00336.html

  In message < 199901290119.UAA29185@blastula.phys.columbia.edu >you write:
  > On Thu, 28 Jan 1999 06:22:55 -0700, Jeffrey A Law wrote:
  > >
  > >  In message < 199901271725.MAA20390@blastula.phys.columbia.edu >you write:
  > >  > Nope.  No spills of reg 2 anywhere.
  > >Hmmm.  Odd.  Does reg 2 show up in the conflict lists?  Otherwise I can't
  > >think of a reason why it would not be used..
  > 
  > I can't reproduce this exact situation anymore, but it seems to want
  > to use reg 2 (and sometimes reg 1 also) for scratch purposes
  > exclusively.
Definitely something not working as expected.  I have vague memories of the
allocators not wanting to take the last register to avoid some pathological
problem in reload.  But I can't find evidence of that code anymore
(this is separate from the CLASS_LIKELY_SPILLED stuff in local-alloc.c).

If this shows up again, we definitely want to dive deeper into it.


  > Flow has the correct live range for pseudo 32, and the counts in lreg
  > are sane.  It now seems to get it right even when the variable is
  > declared with function scope.
Very odd.  So are you saying that pseudos 30 & 32 no longer conflict?  And
as such can share a reg?


  > The problem may have been with some atrociously tangled EOF handling
  > code which I have now thrown away.  There was another variable
  > trivially derived from `count' which was used in the inner loop. Some
  > pass (loop?) may have collapsed the two variables into one.
Quite possible.  Could be one of a number of passes.  cprop, regmove, local
and global all have some capabilities to try and tie registers together.

That's an interesting thought -- what kind of heuristics would be useful 
(particularly in local-alloc.c) to guess when tieing regs together is going
to lose...

  > 
  > Now here's an interesting thing.  The inner loop was originally coded
  > 
  > for(;;)
  > {
  >     unsigned char c;
  >     c = *ip++;
  >     switch(c)
  >     {
  > 	default:
  > 	    *op++ = c;
  > 	    break;
  > 	/* more cases here */
  >     }
  > }
  > 
  > The top of the loop produced RTL like this:
  > 
  > (insn 166 162 167 (set (reg/v:QI 44)
  >         (mem:QI (reg/v:SI 27) 0)) -1 (nil)
  >     (nil))
  > 
  > (insn 167 166 169 (set (reg/v:SI 27)
  >         (plus:SI (reg/v:SI 27)
  >             (const_int 1))) -1 (nil)
  >     (nil))
  > 
  > (note 169 167 476 "" NOTE_INSN_DELETED)
  > 
  > (insn 476 169 477 (set (reg:SI 76)
  >         (zero_extend:SI (reg/v:QI 44))) -1 (nil)
  >     (nil))
  > 
  > (insn 477 476 478 (set (cc0)
  >         (compare (reg:SI 76)
  >             (const_int 10))) -1 (nil)
  >     (nil))
  > ;; switch continues...
  > 
  > No pass was able to collapse regs 44 and 76 together, and we'd end up
  > with assembly output like so:
Well, I don't know what dump this came from, but I don't see a REG_DEAD note
for reg 44 on insn 477.  Without the REG_DEAD note the combination
opportunities are much more limited.

--


  > Another problem which could be related to Marc's code-size issues.
  > The stack frame generated for this function has a ~4K buffer and a
  > bunch of spilled pseudos.  The frame is laid out with the buffer
  > nearer to the frame pointer than the spills, so all the stack slot
  > offsets are large (range 4100-4150) and the assembler is forced to use
  > 32bit displacements.  If the spills were put next to the frame
  > pointer, the assembler could use 8bit displacements.  I can simulate
  > this by using alloca to get the buffer; the code is almost identical,
  > but all the displacements are in the +-128 range and the object code
  > shrinks by 150 bytes.  (It does get it right with -fomit-frame-pointer
  > on.)
  > 
  > Using alloca is not an ideal workaround, because gcc insists on giving
  > the buffer's base a stack slot when it could perfectly well use the
  > machine stack pointer.  This would take some cleverness, but the logic
  > is the same that's needed for -fomit-frame-pointer.
This kind of optimization is nontrivial.  Particularly when you have to work
on targets where the validity of an address may depend on those offsets.  So,
you move something, it's offset is suddenly too big.  You need to reload the
address, blam, you've got to move something else...  

Even for a target like the x86 where the displacements are all valid, it's
not an easy problem.  In fact, it's register allocation all over again, but
with a different metric.  You want to pack the most heavily used stack slots
(which could be spills, locals, temporaries, etc) into the smallest offsets,
and less used things further away...

jeff
  > 
  > zw