public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* a strange infelicity of register allocation
@ 1999-01-23 22:42 Zack Weinberg
  1999-01-25  4:56 ` Joern Rennecke
  1999-01-31 23:58 ` Zack Weinberg
  0 siblings, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-23 22:42 UTC (permalink / raw)
  To: egcs

Consider this loop:

char *p = buffer, *op = out_buffer;
for (;;)
{
    char c = *p++;
    switch (c)
    {
	case '\0':
	   goto out;
	case '\n':
	   /* stuff... */
	   break;
	case '\r':
	    /* similar stuff... */
	    break;
	case '?':
	   /* other stuff... */
	   break;
	default:
	   *op++ = c;
    }
}
out:

Where each special case is long and complicated.  Some of them re-use
the variable c for their own purposes, but the initial store to c is
dead at the beginning of each special case.

The code generated copies c = *p into a register, then out to a stack
slot.  It then does the switch based on the contents of the register.
Down in the default case, c is copied back in from the stack slot and
then written to memory.

If I move the default case to the top of the switch, so it begins like

switch (c)
{
    default:
	*op++ = c;
	break;

then c gets to stay in a register all the time, and the code runs
about twice as fast.  It seems to me we should be generating identical
code for both cases.

Platform is x86; problem was seen with both egcs 1.1.1 and the current
snapshot.  The actual code I'm talking about is the function
read_and_prescan in the patch for newline handling in cpplib which I
just posted.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-23 22:42 a strange infelicity of register allocation Zack Weinberg
@ 1999-01-25  4:56 ` Joern Rennecke
  1999-01-25 11:47   ` Zack Weinberg
  1999-01-31 23:58   ` Joern Rennecke
  1999-01-31 23:58 ` Zack Weinberg
  1 sibling, 2 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-25  4:56 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

> then c gets to stay in a register all the time, and the code runs
> about twice as fast.  It seems to me we should be generating identical
> code for both cases.

Reload inheritance follows only the fall-through path of a branch.
This might be the reason for the differences you are seeing.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25  4:56 ` Joern Rennecke
@ 1999-01-25 11:47   ` Zack Weinberg
  1999-01-25 11:50     ` Joern Rennecke
                       ` (2 more replies)
  1999-01-31 23:58   ` Joern Rennecke
  1 sibling, 3 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-25 11:47 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: egcs

On Mon, 25 Jan 1999 12:55:37 +0000 (GMT), Joern Rennecke wrote:
>> then c gets to stay in a register all the time, and the code runs
>> about twice as fast.  It seems to me we should be generating identical
>> code for both cases.
>
>Reload inheritance follows only the fall-through path of a branch.
>This might be the reason for the differences you are seeing.

Mm, possibly.  Is there a reason for that?

The generated code is a little silly even when adjusted:

	movb	(%esi),%dl
	incl	%esi
	xorl	%eax, %eax
	movb	%dl, %eax
	[pick case branch based on contents of eax...]

We could just as well do it

	xorl	%eax, %eax
	movb	(%esi), %al
	incl	%esi
	[switch...]

and free up a register, which is no small thing on x86.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 11:47   ` Zack Weinberg
@ 1999-01-25 11:50     ` Joern Rennecke
  1999-01-31 23:58       ` Joern Rennecke
  1999-01-25 12:00     ` Jeffrey A Law
  1999-01-31 23:58     ` Zack Weinberg
  2 siblings, 1 reply; 54+ messages in thread
From: Joern Rennecke @ 1999-01-25 11:50 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

> >Reload inheritance follows only the fall-through path of a branch.
> >This might be the reason for the differences you are seeing.
> 
> Mm, possibly.  Is there a reason for that?

Yes.  Reload keeps track of values in registers as it scans the instructions
one after the other.  When a label is reached, it discards the information.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 11:47   ` Zack Weinberg
  1999-01-25 11:50     ` Joern Rennecke
@ 1999-01-25 12:00     ` Jeffrey A Law
  1999-01-25 13:14       ` John S. Dyson
                         ` (2 more replies)
  1999-01-31 23:58     ` Zack Weinberg
  2 siblings, 3 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-25 12:00 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901251946.OAA06903@blastula.phys.columbia.edu >you write:
  > On Mon, 25 Jan 1999 12:55:37 +0000 (GMT), Joern Rennecke wrote:
  > >> then c gets to stay in a register all the time, and the code runs
  > >> about twice as fast.  It seems to me we should be generating identical
  > >> code for both cases.
  > >
  > >Reload inheritance follows only the fall-through path of a branch.
  > >This might be the reason for the differences you are seeing.
  > 
  > Mm, possibly.  Is there a reason for that?
Yes, it allows you to inherit more often.


  > The generated code is a little silly even when adjusted:
  > 
  > 	movb	(%esi),%dl
  > 	incl	%esi
  > 	xorl	%eax, %eax
  > 	movb	%dl, %eax
  > 	[pick case branch based on contents of eax...]
  > 
  > We could just as well do it
  > 
  > 	xorl	%eax, %eax
  > 	movb	(%esi), %al
  > 	incl	%esi
  > 	[switch...]
  > 
  > and free up a register, which is no small thing on x86.
The second sequence may actually be slower though.  The use of %eax and %al
may trigger an interlock.  Someone would need to check.

If we spilled, then we're probably going to lose regardless.  That's why
I'd prefer to see us work on avoid the spills to start with by tightening
up the x86 machine description.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 12:00     ` Jeffrey A Law
@ 1999-01-25 13:14       ` John S. Dyson
  1999-01-31 23:58         ` John S. Dyson
  1999-01-25 13:31       ` Zack Weinberg
  1999-01-31 23:58       ` Jeffrey A Law
  2 siblings, 1 reply; 54+ messages in thread
From: John S. Dyson @ 1999-01-25 13:14 UTC (permalink / raw)
  To: law; +Cc: zack, amylaar, egcs

Jeffrey A Law said:
>   > 
>   > 	xorl	%eax, %eax
>   > 	movb	(%esi), %al
>   > 	incl	%esi
>   > 	[switch...]
>   > 
>   > and free up a register, which is no small thing on x86.
> The second sequence may actually be slower though.  The use of %eax and %al
> may trigger an interlock.  Someone would need to check.
> 
There are special hacks in the P6 that make the above sequence
work well.  That used to be the "correct" idiom on P5 and earlier
(because movzbl was slow.)  Since that was so common, the P6 does
execute the above code very well.

The only thing wrong with the above sequence (taken alone) is that
there might still be a stall from before the xorl.  That case doesn't
happen very often, so xorl %reg,%reg is still mostly better than
movl $0, %reg.  (The xorl doesn't know that the value after it's
operation is invariant based upon it's previous value, while the
movl does know.)  The movl is very big (the constant is a full
32 bits long), so is generally less desirable.

-- 
John                  | Never try to teach a pig to sing,
dyson@iquest.net      | it makes one look stupid
jdyson@nc.com         | and it irritates the pig.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 12:00     ` Jeffrey A Law
  1999-01-25 13:14       ` John S. Dyson
@ 1999-01-25 13:31       ` Zack Weinberg
  1999-01-25 13:36         ` Jeffrey A Law
                           ` (2 more replies)
  1999-01-31 23:58       ` Jeffrey A Law
  2 siblings, 3 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-25 13:31 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 12:56:16 -0700, Jeffrey A Law wrote:
>
>  In message < 199901251946.OAA06903@blastula.phys.columbia.edu >you write:
>
>  > The generated code is a little silly even when adjusted:
[...]
>The second sequence may actually be slower though.  The use of %eax and %al
>may trigger an interlock.  Someone would need to check.
>
>If we spilled, then we're probably going to lose regardless.  That's why
>I'd prefer to see us work on avoid the spills to start with by tightening
>up the x86 machine description.

The code is actually much worse than I thought, even when tweaked.

Input:

unsigned char *ip, *op;
for(;;)
{
    unsigned char c;
    c = *ip++;
    switch(c)
    {
	default: *op++ = c; break;
	case '\0': goto eof;
	case '\r': /* ... */ break;
	case '\n': /* ... */ break;
	case '?':  /* ... */ break;
    }
}
eof:

Current snapshot, -O2:

switch:
	movb (%esi),%al
	movb %al,-4125(%ebp)
	incl %esi
	xorl %eax,%eax
	movb -4125(%ebp),%al
	cmpl $10,%eax
	je case_one
	jg case_two_or_three
	testl %eax,%eax
	je case_four
	jmp default
	.p2align 4,,7
case_two_or_three:
	cmpl $13,%eax
	je case_two
	cmpl $63,%eax
	je case_three
default:
	movb -4125(%ebp),%dl
	movb %dl,(%ebx)
	incl %ebx
	jmp switch
	.p2align 4,,7
...

-4125(%ebp) will be in cache, of course, but this is still pretty
disgusting.

There is questionable code elsewhere, such as:

	decl -4112(%ebp)
	movl -4112(%ebp),%ecx

- if this were written the other way around, you'd avoid an address
generation, and a read-mod-write to memory; note that this particular
location is unlikely to be in L1.  Also the code would be smaller.
This looks like the problem Marc Espie keeps complaining about.

.L331:
	cmpb $63,-4125(%ebp)
	jne .L349
	movb 1(%esi),%cl
	movb %cl,-4125(%ebp)
	testb %cl,%cl
	jne .L333
	cmpl $0,-4120(%ebp)
	jne .L333
	decl -4112(%ebp)
	movl -4112(%ebp),%eax
	movb $63,(%eax)
	decl %eax
	movl %eax,-4112(%ebp)
	movb $63,(%eax)
	jmp .L311
	.p2align 4,,7
.L333:
	movzbl -4125(%ebp),%edi
	cmpb $0,trigraph_table(%edi)
	jne .L334

Here the store to -4125(%ebp) would be dead if it weren't used in the
movzbl at .L333.  I guess this is what Joern said about reloads not
inheriting down branches taken.

C:
	buf = xrealloc (buf, op - buf);
	fp->buf = buf;
	return op - buf;

becomes

	movl %ebx,%eax
	subl -4104(%ebp),%eax
	pushl %eax
	movl -4104(%ebp),%edx
	pushl %edx
	call xrealloc
	movl %eax,-4104(%ebp)
	movl 12(%ebp),%eax
	movl -4104(%ebp),%ecx
	movl %ecx,(%eax)
	subl %ecx,%ebx
	movl %ebx,%eax
	jmp epilogue

This could have been done

	movl -4104(%ebp), %eax
	subl %eax, %ebx
	pushl %ebx
	pushl %eax
	call xrealloc
	movl 12(%ebp), %ecx
	movl %eax, (%ecx)
	movl %ebx, %eax
	jmp epilogue

I think the key issue here is that it doesn't seem to know to preserve
values in call-saved registers over a function call.  It's somewhat
silly to be microoptimizing around a call to xrealloc, but this sort
of code is all over the place.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 13:31       ` Zack Weinberg
@ 1999-01-25 13:36         ` Jeffrey A Law
  1999-01-25 19:56           ` Zack Weinberg
  1999-01-31 23:58           ` Jeffrey A Law
  1999-01-26  7:44         ` Joern Rennecke
  1999-01-31 23:58         ` Zack Weinberg
  2 siblings, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-25 13:36 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901252130.QAA07655@blastula.phys.columbia.edu >you write:
  > Current snapshot, -O2:
  > 
  > switch:
  > 	movb (%esi),%al
  > 	movb %al,-4125(%ebp)
  > 	incl %esi
  > 	xorl %eax,%eax
  > 	movb -4125(%ebp),%al
  > 	cmpl $10,%eax
  > 	je case_one
  > 	jg case_two_or_three
  > 	testl %eax,%eax
  > 	je case_four
  > 	jmp default
  > 	.p2align 4,,7
  > case_two_or_three:
  > 	cmpl $13,%eax
  > 	je case_two
  > 	cmpl $63,%eax
  > 	je case_three
  > default:
  > 	movb -4125(%ebp),%dl
  > 	movb %dl,(%ebx)
  > 	incl %ebx
  > 	jmp switch
  > 	.p2align 4,,7
  > ...
  > 
  > -4125(%ebp) will be in cache, of course, but this is still pretty
  > disgusting.
  > 
  > There is questionable code elsewhere, such as:
  > 
  > 	decl -4112(%ebp)
  > 	movl -4112(%ebp),%ecx
  > 
  > - if this were written the other way around, you'd avoid an address
  > generation, and a read-mod-write to memory; note that this particular
  > location is unlikely to be in L1.  Also the code would be smaller.
  > This looks like the problem Marc Espie keeps complaining about.
First, find out where and why we're getting this code.  Are we getting
reloads?  Look at the dump outputs, they're going to tell you more than
I ever could from looking at the assembly.

  > I think the key issue here is that it doesn't seem to know to preserve
  > values in call-saved registers over a function call.  It's somewhat
  > silly to be microoptimizing around a call to xrealloc, but this sort
  > of code is all over the place.
No, gcc knows how to do that just fine.  See caller-save.c.

It may be the case that it does not think it is profitable to do so.  If so,
you would need to find out why.

But the first and most important thing is to start looking at rtl, not
the assembly code and find out where things start to go wrong.  We can sit
here and speculate about caller-saves, regmove, register allocation, spills,
etc, but it does not do us any good.  You need to look at how we got to this
bad assembly code, step by step.  I recommend starting with either reload or
local allocation, then working either forward or backwards depending on what
you find.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 13:36         ` Jeffrey A Law
@ 1999-01-25 19:56           ` Zack Weinberg
  1999-01-25 20:33             ` Jeffrey A Law
  1999-01-31 23:58             ` Zack Weinberg
  1999-01-31 23:58           ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-25 19:56 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 14:32:40 -0700, Jeffrey A Law wrote:
>  > 	decl -4112(%ebp)
>  > 	movl -4112(%ebp),%ecx
>  > 
>  > - if this were written the other way around, you'd avoid an address
>  > generation, and a read-mod-write to memory; note that this particular
>  > location is unlikely to be in L1.  Also the code would be smaller.
>  > This looks like the problem Marc Espie keeps complaining about.
>First, find out where and why we're getting this code.  Are we getting
>reloads?  Look at the dump outputs, they're going to tell you more than
>I ever could from looking at the assembly.

This one turns out to be alias analysis's fault.

C:
	/* begin basic block */
	*--ptr = '\n';
	break;
	/* end basic block */

where ptr is (ought to be) known to be pointing into a char array on
the stack.

Just before reload:

;; Start of basic block 17, registers live: 6 [bp] 7 [sp] 22 24 25 26 27 28 29 30 31 33 34 92 93
(insn 209 207 211 (set (reg/v:SI 30)
        (plus:SI (reg/v:SI 30)
            (const_int -1))) 148 {addsi3+1} (nil)
    (nil))

(insn 211 209 213 (set (mem:QI (reg/v:SI 30) 0)
        (const_int 13)) 64 {movqi+1} (insn_list 209 (nil))
    (nil))

(jump_insn 213 211 197 (set (pc)
        (label_ref 153)) 288 {jump} (nil)
    (nil))
;; End of basic block 17


Perfectly rational.

After reload:

;; Start of basic block 17, registers live: 3 [bx] 4 [si] 6 [bp] 7 [sp]
(insn 209 207 689 (set (mem:SI (plus:SI (reg:SI 6 %ebp)
                (const_int -4112)) 0)
        (plus:SI (mem:SI (plus:SI (reg:SI 6 %ebp)
                    (const_int -4112)) 0)
            (const_int -1))) 148 {addsi3+1} (nil)
    (nil))

(insn 689 209 211 (set (reg:SI 2 %ecx)
        (mem:SI (plus:SI (reg:SI 6 %ebp)
                (const_int -4112)) 0)) 54 {movsi+2} (nil)
    (nil))

(insn 211 689 213 (set (mem:QI (reg:SI 2 %ecx) 0)
        (const_int 13)) 64 {movqi+1} (insn_list 689 (insn_list 209 (nil)))
    (expr_list:REG_DEAD (reg:SI 2 %ecx)
        (nil)))

(jump_insn 213 211 197 (set (pc)
        (label_ref 153)) 288 {jump} (nil)
    (nil))
;; End of basic block 17

So it decided to do the pointer decrement in memory, then fetch it
(new insn 689) and write.  Why?  Well, there's a reference to an
extern char array in this function - nowhere near bb 17, but never
mind.  It's only ever used as an rvalue.  If I label it `const', we
get after reload:

;; Start of basic block 17, registers live: 3 [bx] 4 [si] 5 [di] 6 [bp] 7 [sp]
(insn 209 207 211 (set (reg/v:SI 5 %edi)
        (plus:SI (reg/v:SI 5 %edi)
            (const_int -1))) 148 {addsi3+1} (nil)
    (nil))

(insn 211 209 213 (set (mem:QI (reg/v:SI 5 %edi) 0)
        (const_int 13)) 64 {movqi+1} (insn_list 209 (insn_list 209 (nil)))
    (nil))

(jump_insn 213 211 197 (set (pc)
        (label_ref 153)) 288 {jump} (nil)
    (nil))
;; End of basic block 17

from which I conclude that it thinks that the extern char array can
alias something, and it has to keep the pointer consistent in memory
just in case.

This is surprising to me because all the char pointers in this
function should have been pegged as either on the stack or in the
malloc arena by alias analysis.  The extern char array is in data
space, which can't collide with either, so we ought to be able to
assume it is unmodified even if it isn't marked const.  (Yes, I used
-fstrict-aliasing.)

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 19:56           ` Zack Weinberg
@ 1999-01-25 20:33             ` Jeffrey A Law
  1999-01-25 20:41               ` Zack Weinberg
  1999-01-31 23:58               ` Jeffrey A Law
  1999-01-31 23:58             ` Zack Weinberg
  1 sibling, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-25 20:33 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901260356.WAA09714@blastula.phys.columbia.edu >you write:
  > On Mon, 25 Jan 1999 14:32:40 -0700, Jeffrey A Law wrote:
  > >  > 	decl -4112(%ebp)
  > >  > 	movl -4112(%ebp),%ecx
  > >  > 
  > >  > - if this were written the other way around, you'd avoid an address
  > >  > generation, and a read-mod-write to memory; note that this particular
  > >  > location is unlikely to be in L1.  Also the code would be smaller.
  > >  > This looks like the problem Marc Espie keeps complaining about.
  > >First, find out where and why we're getting this code.  Are we getting
  > >reloads?  Look at the dump outputs, they're going to tell you more than
  > >I ever could from looking at the assembly.
  > 
  > This one turns out to be alias analysis's fault.
  > 
  > C:
  > 	/* begin basic block */
  > 	*--ptr = '\n';
  > 	break;
  > 	/* end basic block */
  > 
  > where ptr is (ought to be) known to be pointing into a char array on
  > the stack.
Hmmm, it looks more like a spill than an alias problem.  ie, (reg 30) is not
homed in a hard register, but is instead is homed in a stack slot.

This is accomplished by changing (reg 30) to (mem (plus (fp) (offset))
in every occurrence (BTW the expression (reg 30) is shared).

The increment is still a valid insn after that transformation, so it's left
alone.

We have to load the value out of the stack slot for insn 211 because we do
not have indirect addressing (ie, (mem (mem ...)) is not a valid operand).
That's where insn 689 comes from.

Before looking for ways to optimize the reloaded code, we should look into
the register assignments and reloading.

ie, did (reg 30) initially get a hard register, but then get spilled because
the hard register was needed elsewhere?  Or did (reg 30) simply not get a hard
register because none was available.

So, first find out if (reg 30) is local to a single basic block.  You can find
that info in the .lreg file.  It'll say something like:

Register 99 used 2 times across 4 insns in block 0; set 1 time; GENERAL_REGS or 

If it doesn't say "in block X", then the register is live across multiple
blocks and can only be allocated by global allocation.  If it does say
"in block X", then it *may* be allocated by local allocation.

If it is allocated by local allocation, then in the .lreg file there'll be
a message like:

;; Register 95 in 19.

Which means pseudo 95 was assigned to hard register 19 (obviously the #s will
change, I'm taking tidbits from a PA dump).


What you find will determine what we look for in the .greg dump.


jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 20:33             ` Jeffrey A Law
@ 1999-01-25 20:41               ` Zack Weinberg
  1999-01-25 20:53                 ` Jeffrey A Law
  1999-01-31 23:58                 ` Zack Weinberg
  1999-01-31 23:58               ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-25 20:41 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 21:30:10 -0700, Jeffrey A Law wrote:
>
>  In message < 199901260356.WAA09714@blastula.phys.columbia.edu >you write:
>  > On Mon, 25 Jan 1999 14:32:40 -0700, Jeffrey A Law wrote:
>  > >  > 	decl -4112(%ebp)
>  > >  > 	movl -4112(%ebp),%ecx
>  > >  > 
>  > >  > - if this were written the other way around, you'd avoid an address
>  > >  > generation, and a read-mod-write to memory; note that this particular
>  > >  > location is unlikely to be in L1.  Also the code would be smaller.
>  > >  > This looks like the problem Marc Espie keeps complaining about.
>  > >First, find out where and why we're getting this code.  Are we getting
>  > >reloads?  Look at the dump outputs, they're going to tell you more than
>  > >I ever could from looking at the assembly.
>  > 
>  > This one turns out to be alias analysis's fault.
>  > 
>  > C:
>  > 	/* begin basic block */
>  > 	*--ptr = '\n';
>  > 	break;
>  > 	/* end basic block */
>  > 
>  > where ptr is (ought to be) known to be pointing into a char array on
>  > the stack.
>Hmmm, it looks more like a spill than an alias problem.  ie, (reg 30) is not
>homed in a hard register, but is instead is homed in a stack slot.
>
>This is accomplished by changing (reg 30) to (mem (plus (fp) (offset))
>in every occurrence (BTW the expression (reg 30) is shared).
>
>The increment is still a valid insn after that transformation, so it's left
>alone.

Right.  The reason I thought it was alias, is that it gets a hard
register if I change the extern char array declaration to be const.

>Before looking for ways to optimize the reloaded code, we should look into
>the register assignments and reloading.
>
>ie, did (reg 30) initially get a hard register, but then get spilled because
>the hard register was needed elsewhere?  Or did (reg 30) simply not get a hard
>register because none was available.
>
>So, first find out if (reg 30) is local to a single basic block.

It's not. lreg says

Register 30 used 53 times across 167 insns; set 7 times; user var; 
	crosses 3 calls; GENERAL_REGS or none; pointer.

(It's referenced in the outer loop and in several special cases
of the inner loop.)

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 20:41               ` Zack Weinberg
@ 1999-01-25 20:53                 ` Jeffrey A Law
  1999-01-25 21:18                   ` Zack Weinberg
  1999-01-31 23:58                   ` Jeffrey A Law
  1999-01-31 23:58                 ` Zack Weinberg
  1 sibling, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-25 20:53 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901260441.XAA09930@blastula.phys.columbia.edu >you write:
  > Right.  The reason I thought it was alias, is that it gets a hard
  > register if I change the extern char array declaration to be const.
Interesting.  But that may work because it allows optimizations earlier in
the compiler, and for whatever reason we can find a hard reg for (reg 30).

  > Register 30 used 53 times across 167 insns; set 7 times; user var; 
  > 	crosses 3 calls; GENERAL_REGS or none; pointer.
  > 
  > (It's referenced in the outer loop and in several special cases
  > of the inner loop.)
OK.  So at the top of the .greg dump you there should be a list of registers
in priority order with a conflict matrix.  That should give you some idea
of how it fits in with the rest of the registers that need to be allocated
by global alloc.  Probably worth sending the regs to allocate list up to
and including reg 30.  Probably not worth sending the whole conflict list yet.

You also need to look in the .greg dump for messages like

Spilling for insn 77.
Spilling reg 1.
 Register 110 now in 31.
 Register 120 now in stack


Probably the first thing to look for is "Register 30 now in stack" which
would indicate that it got a register during global allocation, but the
register it got was spilled and re-allocation failed.

If there isn't any such message, then the reg 30 never got a hard register
because it's priority was too low.

Looking at the calls & uses info, if a caller-saved register is available, the
compiler should use it (4 * #calls < #refs).


jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 20:53                 ` Jeffrey A Law
@ 1999-01-25 21:18                   ` Zack Weinberg
  1999-01-25 21:30                     ` Jeffrey A Law
  1999-01-31 23:58                     ` Zack Weinberg
  1999-01-31 23:58                   ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-25 21:18 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 21:49:32 -0700, Jeffrey A Law wrote:
>
>  In message < 199901260441.XAA09930@blastula.phys.columbia.edu >you write:
>  > Right.  The reason I thought it was alias, is that it gets a hard
>  > register if I change the extern char array declaration to be const.
>Interesting.  But that may work because it allows optimizations earlier in
>the compiler, and for whatever reason we can find a hard reg for (reg 30).

The only other difference it seems to make is that the other vars in
hard regs got shuffled around a bit.

>  > Register 30 used 53 times across 167 insns; set 7 times; user var; 
>  > 	crosses 3 calls; GENERAL_REGS or none; pointer.
>  > 
>  > (It's referenced in the outer loop and in several special cases
>  > of the inner loop.)
>OK.  So at the top of the .greg dump you there should be a list of registers
>in priority order with a conflict matrix.  That should give you some idea
>of how it fits in with the rest of the registers that need to be allocated
>by global alloc.  Probably worth sending the regs to allocate list up to
>and including reg 30.  Probably not worth sending the whole conflict list yet.


;; 28 regs to allocate: 64 75 46 49 32 28 44 79 85 94 27 57 30 ...

;; 30 conflicts: 22 24 25 26 27 28 29 30 31 32 33 34 40 42 43 44 
		 46 49 52 57 58 64 70 75 92 93 94 0 7

Register 30 used 53 times across 167 insns; crosses 3 calls; 
	GENERAL_REGS or none; pointer.

The odd thing is, the use count for reg 30 shouldn't be anywhere near
that high.  It's used at the C level to reset the input buffer pointer
each time through the outer loop, and some code modifies it just
before jumping back to the beginning of the outer loop.  (GCC doesn't
seem to realize that it's going back to the beginning of the outer
loop - I could fix that with gotos, but it would be ugly.)

I'm much more interested in regs 44 and 75, actually. They're both the
same variable (one in QI, one in SI) which is critical in the inner
loop.  Both are ahead of reg 30; 75 gets a hard register and 44
doesn't.  Reg 44 is used for several different things, maybe I should
give them different C-level variables.

;; 44 conflicts: 22 24 25 26 27 28 29 30 31 33 34 44 52 57 58 64 75 92 93 94 7
;; 75 conflicts: 22 24 25 26 27 28 29 30 31 33 34 44 75 92 93 7

Register 44 used 33 times across 60 insns; dies in 4 places; crosses 1
	call; 1 bytes; pref Q_REGS, else GENERAL_REGS.

Register 75 used 18 times across 9 insns; dies in 2 places; 
	pref Q_REGS, else GENERAL_REGS.

>You also need to look in the .greg dump for messages like
>
>Spilling for insn 77.
>Spilling reg 1.
> Register 110 now in 31.
> Register 120 now in stack

There's a whole mess of 'em, but they are singularly uninformative.

Spilling for insn 4.
Spilling reg 0.
Spilling for insn 8.
Spilling reg 0.
Spilling for insn 10.
Spilling reg 0.
Spilling for insn 16.
Spilling reg 0.
...

It never says anything other than reg 0, 1, or 2, and it hits just
about every insn in the function.

There's no `Register X now in stack' message for any register.

>Probably the first thing to look for is "Register 30 now in stack" which
>would indicate that it got a register during global allocation, but the
>register it got was spilled and re-allocation failed.
>
>If there isn't any such message, then the reg 30 never got a hard register
>because it's priority was too low.
>
>Looking at the calls & uses info, if a caller-saved register is available, the
>compiler should use it (4 * #calls < #refs).

Are you looking at the same function as me?  (It's read_and_prescan
out of the cpplib patch I sent earlier this evening.)

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 21:18                   ` Zack Weinberg
@ 1999-01-25 21:30                     ` Jeffrey A Law
  1999-01-25 21:38                       ` Zack Weinberg
  1999-01-31 23:58                       ` Jeffrey A Law
  1999-01-31 23:58                     ` Zack Weinberg
  1 sibling, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-25 21:30 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  > ;; 28 regs to allocate: 64 75 46 49 32 28 44 79 85 94 27 57 30 ...
  > 
  > ;; 30 conflicts: 22 24 25 26 27 28 29 30 31 32 33 34 40 42 43 44 
  > 		 46 49 52 57 58 64 70 75 92 93 94 0 7
Well, there's not likely to be any register available for 30.  It conflicts
with 10 other registers that have a higher priority.  For a machine with 6
user regs the chances are slim.  The only hope is that some of those regs
in the conflict list can be allocated into a the same hard register.


  > Register 30 used 53 times across 167 insns; crosses 3 calls; 
  > 	GENERAL_REGS or none; pointer.
  > 
  > The odd thing is, the use count for reg 30 shouldn't be anywhere near
  > that high.
Use counts are weighted to give pseudos used in loops a better chance of being
allocated into a register.

  > I'm much more interested in regs 44 and 75, actually. They're both the
  > same variable (one in QI, one in SI) which is critical in the inner
  > loop.  Both are ahead of reg 30; 75 gets a hard register and 44
  > doesn't.  Reg 44 is used for several different things, maybe I should
  > give them different C-level variables.
Quite likely.  We have some code to do live range splitting at Cygnus.  I
haven't tried to make a case to integrate it into egcs as it will be made
obsolete by David Miller's new allocator.

  > 
OK.

  > It never says anything other than reg 0, 1, or 2, and it hits just
  > about every insn in the function.
Right.  We have to scan each insn in the function to see if any registers
need to be spilled at that particular insn.

  > There's no `Register X now in stack' message for any register.
OK.  That means the spills didn't force anything into the stack.  So (reg 30)
never got a hard register to begin with.  Given its position in the list
and the conflict matrix, I'm not suprised.

  > Are you looking at the same function as me?  (It's read_and_prescan
  > out of the cpplib patch I sent earlier this evening.)
I'm not looking at the C code at all, just the dump info you're providing.

The use/calls crossed counts you provided would make that pseudo a candidate
for caller-save optimizations.  However, there isn't a register available.

The next step is to find out what registers are used by the entries on the
conflict list.  In the .greg file there should be something like this:

;; Register dispositions:
94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
[ ... ]


Any register not listed is on the stack.

Find out what regs where used by conflicting pseudos with a higher priority.

I bet you'll find there isn't a free register that can be used for (reg 30).

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 21:30                     ` Jeffrey A Law
@ 1999-01-25 21:38                       ` Zack Weinberg
  1999-01-26  4:59                         ` Jeffrey A Law
  1999-01-31 23:58                         ` Zack Weinberg
  1999-01-31 23:58                       ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-25 21:38 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 22:27:06 -0700, Jeffrey A Law wrote:
>
>
>
>  > ;; 28 regs to allocate: 64 75 46 49 32 28 44 79 85 94 27 57 30 ...
>  > 
>  > ;; 30 conflicts: 22 24 25 26 27 28 29 30 31 32 33 34 40 42 43 44 
>  > 		 46 49 52 57 58 64 70 75 92 93 94 0 7
>Well, there's not likely to be any register available for 30.  It conflicts
>with 10 other registers that have a higher priority.  For a machine with 6
>user regs the chances are slim.  The only hope is that some of those regs
>in the conflict list can be allocated into a the same hard register.

Right.

>
>  > Register 30 used 53 times across 167 insns; crosses 3 calls; 
>  > 	GENERAL_REGS or none; pointer.
>  > 
>  > The odd thing is, the use count for reg 30 shouldn't be anywhere near
>  > that high.
>Use counts are weighted to give pseudos used in loops a better chance of being
>allocated into a register.

Ah, I see.

>  > It never says anything other than reg 0, 1, or 2, and it hits just
>  > about every insn in the function.
>Right.  We have to scan each insn in the function to see if any registers
>need to be spilled at that particular insn.

Those are hard register numbers, right?

>The next step is to find out what registers are used by the entries on the
>conflict list.  In the .greg file there should be something like this:
>
>;; Register dispositions:
>94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
>102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
>[ ... ]

23 in 0  27 in 4  28 in 3  32 in 1  40 in 0  42 in 3  
43 in 5  46 in 0  49 in 0  57 in 5  64 in 0  70 in 0  
75 in 0  79 in 0  80 in 0  85 in 0  94 in 0  

which is strange: hard register 2 (ecx) isn't used for anything.  What
would cause that?

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 21:38                       ` Zack Weinberg
@ 1999-01-26  4:59                         ` Jeffrey A Law
  1999-01-27  9:26                           ` Zack Weinberg
  1999-01-31 23:58                           ` Jeffrey A Law
  1999-01-31 23:58                         ` Zack Weinberg
  1 sibling, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-26  4:59 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901260538.AAA10305@blastula.phys.columbia.edu >you write:
  > >Right.  We have to scan each insn in the function to see if any registers
  > >need to be spilled at that particular insn.
  > 
  > Those are hard register numbers, right?
Close enough.  It's a little more complicated than that.  But yes, we have
to find points where (for example) a pseudo got the wrong kind of hard
register.

We emit copy insns to get the value in/out of the register we need.

We must spill any value which allocated to the hard register we need for the
length of time that we need a particular hard register.  [ This is a recent
improvement, we used to spill *everything* that was allocated to the hard
register we need. ]

The act of spilling at one site may cause spilling elsewhere, so it's a
repeat until nothing changes algorithm.


  > >The next step is to find out what registers are used by the entries on the
  > >conflict list.  In the .greg file there should be something like this:
  > >
  > >;; Register dispositions:
  > >94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
  > >102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
  > >[ ... ]
  > 
  > 23 in 0  27 in 4  28 in 3  32 in 1  40 in 0  42 in 3  
  > 43 in 5  46 in 0  49 in 0  57 in 5  64 in 0  70 in 0  
  > 75 in 0  79 in 0  80 in 0  85 in 0  94 in 0  
  > 
  > which is strange: hard register 2 (ecx) isn't used for anything.  What
  > would cause that?
Check and see if there are any "Spilling reg 2." messages.  I bet you'll find
one for an insn over which pseudo 30 is live.


Anyway.

  Pseudo	Hard
    64		  0
    75		  0
    46		  0
    49		  0
    32		  1
    28		  3
    44		 None
    79		  0
    85		  0
    94		  0
    27		  4
    57		  5
    30		 None


As you mentioned, it might be worth doing a manual live range split so that
reg 44 can get allocated into a hard register rather than a stack slot.

It's interesting to note that pseudos 27 and 32 are the only registers
allocated to hard registers 4 and 1 respectively.  Presumably they have long
lifetimes (which makes them conflict with everything) and high usage counts
which gives them a high priority.

It would be interesting to know what user variables those pseudos correspond
to (or temporary expressions if they aren't a user variable).  They may be
candidates for splitting too.  It would also be interesting to relate them
back to the source to see if their weighted counts make any sense.

Consider a loop with a switch statement that is executed once every loop
iteration.  In different cases of the switch statement we have a use of a
pseudo.  Even though only one of the case statements will execute each
iteration of the loop, we might be summing their use counts.

Or more generally, I'm not sure if our weighted counting system looks at the
flow control within a loop at all.  Doing so may (or may not) be difficult.

That may (or may not) be important for this case.  I do remember a very old
(circa 1991) "bug" report about this.

The other thing to look at would be the accuracy of the live length count
since that's one of the other major components to compute priority.  I do not
believe we recompute that count after combine & regmove scramble insns.  Thus
they may be inaccurate.  Again, those inaccuracies may or may not be important
for this example.

I guess in summary, so far it looks like we don't have a register available,
but we need to make sure that the allocations we made before pseudo 30 make
sense.  ie, are those registers really used more often, or do we have a
problem with our counts?


jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 13:31       ` Zack Weinberg
  1999-01-25 13:36         ` Jeffrey A Law
@ 1999-01-26  7:44         ` Joern Rennecke
  1999-01-27  8:35           ` Zack Weinberg
  1999-01-31 23:58           ` Joern Rennecke
  1999-01-31 23:58         ` Zack Weinberg
  2 siblings, 2 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-26  7:44 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: law, amylaar, egcs

> There is questionable code elsewhere, such as:
> 
> 	decl -4112(%ebp)
> 	movl -4112(%ebp),%ecx

This is a CISC specific problem - load-store architectures don't see it since
they have to do a load first for the decrement, and then can inherit from the
store.
I think it could be tackled by something I'd call 'reverse inheritance' - you
remember when could have used the value in a register, and when you later
find a load of the same value, and the register is free from the desired use
to the load, the load can be moved upwards.
However, in some cases - like this one - this strategy increases code size
and issue bandwidth, so its use has to be conditionalized on ! optimize_size
and on compiling for a processor where simplicity of operations is more
important than the mere instruction count.

> C:
> 	buf = xrealloc (buf, op - buf);
> 	fp->buf = buf;
> 	return op - buf;
> 
> becomes
> 
> 	movl %ebx,%eax
> 	subl -4104(%ebp),%eax
> 	pushl %eax
> 	movl -4104(%ebp),%edx
> 	pushl %edx
> 	call xrealloc
> 	movl %eax,-4104(%ebp)
> 	movl 12(%ebp),%eax
> 	movl -4104(%ebp),%ecx
> 	movl %ecx,(%eax)
> 	subl %ecx,%ebx
> 	movl %ebx,%eax
> 	jmp epilogue

The first problem here is that (op - buf) hasn't been subjected to cse.
Is the address of buf taken anywhere in this function?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-26  7:44         ` Joern Rennecke
@ 1999-01-27  8:35           ` Zack Weinberg
  1999-01-27  9:08             ` Joern Rennecke
                               ` (2 more replies)
  1999-01-31 23:58           ` Joern Rennecke
  1 sibling, 3 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-27  8:35 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: law, egcs

On Tue, 26 Jan 1999 15:44:06 +0000 (GMT), Joern Rennecke wrote:

>> C:
>> 	buf = xrealloc (buf, op - buf);
>> 	fp->buf = buf;
>> 	return op - buf;
>> 
>> becomes
[...]

>The first problem here is that (op - buf) hasn't been subjected to cse.
>Is the address of buf taken anywhere in this function?

No, neither buf nor op has its address taken.

Right before CSE, we have this RTL for the above code.  CSE makes no
changes except to annotate insns with patterns.  Flow thinks this is
one basic block, and that reg 86 is dead after insn 608.

Local register allocation says reg 86 is bb-local but doesn't give it
a hard register.  Combine collapses reg 87 into reg 28.

(code_label 603 601 606 47 "")

(insn 606 603 608 (set (reg:SI 86)
        (minus:SI (reg/v:SI 28)
            (reg/v:SI 26))) -1 (nil)
    (nil))

(insn 608 606 610 (set (mem:SI (pre_dec:SI (reg:SI 7 %esp)) 0)
        (reg:SI 86)) -1 (nil)
    (nil))

(insn 610 608 612 (set (mem:SI (pre_dec:SI (reg:SI 7 %esp)) 0)
        (reg/v:SI 26)) -1 (nil)
    (nil))

(call_insn 612 610 614 (set (reg:SI 0 %eax)
        (call (mem:QI (symbol_ref:SI ("xrealloc")) 0)
            (const_int 8))) -1 (nil)
    (nil)
    (nil))

(insn 614 612 617 (set (reg/v:SI 26)
        (reg:SI 0 %eax)) -1 (nil)
    (nil))

(insn 617 614 620 (set (mem/s:SI (reg/v:SI 23) 5)
        (reg/v:SI 26)) -1 (nil)
    (nil))

(insn 620 617 622 (set (reg:SI 87)
        (minus:SI (reg/v:SI 28)
            (reg/v:SI 26))) -1 (nil)
    (nil))

(insn 622 620 624 (set (reg/i:SI 0 %eax)
        (reg:SI 87)) -1 (nil)
    (nil))

(jump_insn 624 622 625 (set (pc)
        (label_ref 654)) -1 (nil)
    (nil))

(barrier 625 624 626)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  8:35           ` Zack Weinberg
@ 1999-01-27  9:08             ` Joern Rennecke
  1999-01-27  9:52               ` Zack Weinberg
  1999-01-31 23:58               ` Joern Rennecke
  1999-01-28  8:11             ` Jeffrey A Law
  1999-01-31 23:58             ` Zack Weinberg
  2 siblings, 2 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-27  9:08 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: amylaar, law, egcs

> Right before CSE, we have this RTL for the above code.  CSE makes no
> changes except to annotate insns with patterns.  Flow thinks this is

Well, why doesn't it use reg 86 in insn 620?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-26  4:59                         ` Jeffrey A Law
@ 1999-01-27  9:26                           ` Zack Weinberg
  1999-01-28  5:26                             ` Jeffrey A Law
  1999-01-31 23:58                             ` Zack Weinberg
  1999-01-31 23:58                           ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-27  9:26 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Tue, 26 Jan 1999 05:55:21 -0700, Jeffrey A Law wrote:
>  > >The next step is to find out what registers are used by the
>  > >entries on the conflict list.  In the .greg file there should be
>  > >something like this:
>  > >
>  > >;; Register dispositions:
>  > >94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
>  > >102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
>  > >[ ... ]
>  > 
>  > 23 in 0  27 in 4  28 in 3  32 in 1  40 in 0  42 in 3  
>  > 43 in 5  46 in 0  49 in 0  57 in 5  64 in 0  70 in 0  
>  > 75 in 0  79 in 0  80 in 0  85 in 0  94 in 0  
>  > 
>  > which is strange: hard register 2 (ecx) isn't used for anything.  What
>  > would cause that?
>Check and see if there are any "Spilling reg 2." messages.  I bet you'll find
>one for an insn over which pseudo 30 is live.

Nope.  No spills of reg 2 anywhere.

>As you mentioned, it might be worth doing a manual live range split so that
>reg 44 can get allocated into a hard register rather than a stack slot.
>
>It's interesting to note that pseudos 27 and 32 are the only registers
>allocated to hard registers 4 and 1 respectively.  Presumably they have long
>lifetimes (which makes them conflict with everything) and high usage counts
>which gives them a high priority.
>
>It would be interesting to know what user variables those pseudos correspond
>to (or temporary expressions if they aren't a user variable).  They may be
>candidates for splitting too.  It would also be interesting to relate them
>back to the source to see if their weighted counts make any sense.

Pseudo 27 was the input buffer pointer, which deserved a permanent
register.  Pseudo 32 was `count', a variable used heavily in the outer
loop but not anywhere in the inner loop - code like this:

for (;;)
{
  count = read (...);
  if (count < 0) goto error;
  else if (count == 0) break;
  offset += count;
  ip = ibase;
  ip[count] = '\0';
  if (offset > limit) { /* grow output buffer */ }

  while ((c = *ip++) != '\0')
    switch (c)
    {
      /* count not used anywhere in here */
    }
}

I put a block around the real useful range of `count' and declared it
there, and it now shares reg 1 with pseudo 96 which appears to be a
pointer to the `cpp_options' structure.

That, plus inventing a variable to use instead of `c' inside the
switch, and adjusting some branches to go straight to their final
destinations, has fixed up the allocation pretty well.

;; Register dispositions:
23 in 0  27 in 4  28 in 3  29 in 5  35 in 1  40 in 0  
42 in 3  43 in 5  44 in 2  46 in 0  49 in 0  58 in 0  
65 in 0  76 in 0  80 in 0  81 in 0  86 in 0  96 in 1  
;; Hard regs used:  0 1 2 3 4 5 6 7 16

Reg 76 used to be reg 75.  Reg 44 now gets a hard register, and we're
using all the registers again.  The invented variable, which is reg
56, does not get a hard register - that's unfortunate but not nearly
as bad.  I think it's because the code that uses reg 56 looks like
this:

	case '?':
	if (opts->trigraphs || opts->warn_trigraphs)
	{
	  char d;
	  ... stuff involving d but not c ...
	}
	else
	   *op++ = c;
	break;

It would need to realize that c is dead if the conditional is true.
That's live-range-splitting again, yes?

The failure to recognize that `count' is only live in the outer part
of the loop concerns me.  We do have live range *trimming*, don't we?
If so, why doesn't it work here?

So yes, this code benefits considerably from live range splitting.  I
am not sure about the use-count issue, I'd have to stare a lot more.
Another place this code could win is if local register allocation
could handle a block like this itself:

{
  int x = read(...)
  if (x < 0)
    goto error;
  else if (x == 0)
    goto eof;
 
   /* do stuff with x here */
}

I think this is what you meant by `extended basic blocks' ?  There are
a bunch of cases like this.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  9:08             ` Joern Rennecke
@ 1999-01-27  9:52               ` Zack Weinberg
  1999-01-27 11:49                 ` Marc Espie
  1999-01-31 23:58                 ` Zack Weinberg
  1999-01-31 23:58               ` Joern Rennecke
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-27  9:52 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: law, egcs

On Wed, 27 Jan 1999 17:07:37 +0000 (GMT), Joern Rennecke wrote:
>> Right before CSE, we have this RTL for the above code.  CSE makes no
>> changes except to annotate insns with patterns.  Flow thinks this is
>
>Well, why doesn't it use reg 86 in insn 620?

I've no idea.  I can do the CSE by hand, which produces better,
but still suboptimal assembler:

        subl -4104(%ebp),%ebx
        pushl %ebx
        movl -4104(%ebp),%ecx
        pushl %ecx
        call xrealloc
        movl %eax,-4104(%ebp)
        movl 12(%ebp),%eax
        movl -4104(%ebp),%edx
        movl %edx,(%eax)
        movl %ebx,%eax

Two issues here.  One is that after the call, %eax is spilled and
restored to %edx, so we can load the pointer at 12(%ebp) into %eax.
It's willing to use %edx here, so why not just load the pointer in
there?  Then we could generate

	movl 12(%ebp),%edx
	movl %eax,(%edx)
	movl %ebx,%eax

This might be lreg's fault.  Here's the RTL after lreg:

(insn 618 616 676 (set (reg/v:SI 26)
        (reg:SI 0 %eax)) 54 {movsi+2} (insn_list 616 (nil))
    (expr_list:REG_DEAD (reg:SI 0 %eax)
        (nil)))

(insn 676 618 621 (set (reg/v:SI 23)
        (mem:SI (plus:SI (reg:SI 16 %argp)
                (const_int 4)) 2)) 54 {movsi+2} (nil)
    (expr_list:REG_EQUIV (mem/f:SI (plus:SI (reg:SI 16 %argp)
                (const_int 4)) 2)
        (nil)))

(insn 621 676 624 (set (mem/s:SI (reg/v:SI 23) 5)
        (reg/v:SI 26)) 54 {movsi+2} (insn_list 618 (nil))
    (expr_list:REG_DEAD (reg/v:SI 26)
        (expr_list:REG_DEAD (reg/v:SI 23)
            (nil))))

(insn 624 621 626 (set (reg/i:SI 0 %eax)
        (reg/v:SI 86)) 54 {movsi+2} (nil)
    (expr_list:REG_DEAD (reg/v:SI 86)
        (nil)))

and lreg says of (reg/v:SI 23)
Register 23 used 2 times across 2 insns in block 58; set 1 time; 
	user var; GENERAL_REGS or none; pointer.

but it didn't give it a hard register...

The other issue is that before the call, we generate what looks to me
like silly assembler:

        subl -4104(%ebp),%ebx
        pushl %ebx
        movl -4104(%ebp),%ecx
        pushl %ecx

Which could be written

        movl -4104(%ebp),%ecx
        subl %ecx,%ebx
        pushl %ebx
        pushl %ecx

saving four bytes of machine code, and I'd think it's faster this way
too (but I'm no expert on Intel scheduling issues).

If I use -Os, it generates almost the assembly I suggested, but
there's a pile of other changes which invalidate the analysis: for
example, -4104(%ebp) is now in %edi apparently throughout the function.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  9:52               ` Zack Weinberg
@ 1999-01-27 11:49                 ` Marc Espie
  1999-01-31 23:58                   ` Marc Espie
  1999-01-31 23:58                 ` Zack Weinberg
  1 sibling, 1 reply; 54+ messages in thread
From: Marc Espie @ 1999-01-27 11:49 UTC (permalink / raw)
  To: zack; +Cc: egcs

In article < 199901271751.MAA20528@blastula.phys.columbia.edu > you write:

>The other issue is that before the call, we generate what looks to me
>like silly assembler:

>        subl -4104(%ebp),%ebx
>        pushl %ebx
>        movl -4104(%ebp),%ecx
>        pushl %ecx

>Which could be written

>        movl -4104(%ebp),%ecx
>        subl %ecx,%ebx
>        pushl %ebx
>        pushl %ecx

>saving four bytes of machine code, and I'd think it's faster this way
>too (but I'm no expert on Intel scheduling issues).

Thank you for looking at this issue. I don't know enough about egcs
internals nor x86 assembly yet to phrase things in such a precise
way.

Yep, this is exactly the kind of thing I've been seeing since 1.1.1,
even more obvious in some cases, such as mucking with a value directly
in memory, before bringing it in a register... without ANY branch or
code in sight that would justify such behavior.

Even though I don't know much x86 assembler, this does indeed look weird
to me.

I also tried profiling this kind of behavior.  Both flavors (in-memory
vs. register) do exhibit the same run-time... that's probably cache
at work. 

>If I use -Os, it generates almost the assembly I suggested, but
>there's a pile of other changes which invalidate the analysis: for
>example, -4104(%ebp) is now in %edi apparently throughout the function.

There's something I don't quite understand in the way egcs works.
It seems that all this register allocation behavior occurs only at the
global level.  What is weird is that sometimes, the resulting code can
obviously be improved, to the point that a human being can easily perform 
the incremental change needed. I would have thought there would have been
a peep-hole pass that would take care of this kind of back-end issues...

On another note,  I finally managed to build a complete openbsd system
with egcs990124.  Code quality has vastly improved in other areas since
1.1.1. The size of -Os optimized code is much closer than what gcc 2.8.1 -O2
used to give.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  9:26                           ` Zack Weinberg
@ 1999-01-28  5:26                             ` Jeffrey A Law
  1999-01-28 17:20                               ` Zack Weinberg
  1999-01-31 23:58                               ` Jeffrey A Law
  1999-01-31 23:58                             ` Zack Weinberg
  1 sibling, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-28  5:26 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901271725.MAA20390@blastula.phys.columbia.edu >you write:
  > Nope.  No spills of reg 2 anywhere.
Hmmm.  Odd.  Does reg 2 show up in the conflict lists?  Otherwise I can't
think of a reason why it would not be used..

It's a caller-save register, so it should only be used for either values
which are not live across calls, or which have high use/#calls crossed
counts.

I think we already determined that pseudo 30 had a high enough ratio to use
a caller saved reg, so I'm rather suprised that pseudo 30 wasn't allocated
into hard reg 2.

It may be worth a breakpoint in global.c::find_reg to see what's going on.

You'll have to map from an allocno back to the pseudo.  Probably the easiest
way to do that is to look at reg_allocno[30].

I guess it is possible the allocno's use/#calls cross ratio isn't high enough
if multiple registers are assigned the same allocno.  Though I'm not sure if
global does that or not.

Anyway, once you get control in find_reg for the right allocno it should be
possible to determine why we didn't assign reg 30 to %ecx and emit save/restore
code around call sites.


  > Pseudo 27 was the input buffer pointer, which deserved a permanent
  > register.  Pseudo 32 was `count', a variable used heavily in the outer
  > loop but not anywhere in the inner loop - code like this:
  > 
  > for (;;)
  > {
  >   count = read (...);
  >   if (count < 0) goto error;
  >   else if (count == 0) break;
  >   offset += count;
  >   ip = ibase;
  >   ip[count] = '\0';
  >   if (offset > limit) { /* grow output buffer */ }
  > 
  >   while ((c = *ip++) != '\0')
  >     switch (c)
  >     {
  >       /* count not used anywhere in here */
  >     }
  > }
Ah.  I wonder if global is considering pseudo 32 live across the inner loop.
As far as I can tell it isn't really live across the inner loop and isn't
used in the inner loop.  In theory, we should be able to re-use its hard reg
for a different pseudo inside the loop.

Something else to investigate.  Why does psuedo 32 conflict with pseudo 30.

  > I put a block around the real useful range of `count' and declared it
  > there, and it now shares reg 1 with pseudo 96 which appears to be a
  > pointer to the `cpp_options' structure.
  >
  > 
  > That, plus inventing a variable to use instead of `c' inside the
  > switch, and adjusting some branches to go straight to their final
  > destinations, has fixed up the allocation pretty well.
Yup.  Manual LRS.  Probably more sophisticated than the stuff Cygnus did, but
it does show how effective such opts can be on this kind of target.


  > Reg 76 used to be reg 75.  Reg 44 now gets a hard register, and we're
  > using all the registers again.  The invented variable, which is reg
  > 56, does not get a hard register - that's unfortunate but not nearly
  > as bad.  I think it's because the code that uses reg 56 looks like
  > this:
  > 
  > 	case '?':
  > 	if (opts->trigraphs || opts->warn_trigraphs)
  > 	{
  > 	  char d;
  > 	  ... stuff involving d but not c ...
  > 	}
  > 	else
  > 	   *op++ = c;
  > 	break;
  > 
  > It would need to realize that c is dead if the conditional is true.
  > That's live-range-splitting again, yes?
I'd have to see all the code.  "c" may or may not be dead.


  > The failure to recognize that `count' is only live in the outer part
  > of the loop concerns me.  We do have live range *trimming*, don't we?
  > If so, why doesn't it work here?
I agree (see above).  It's not "trimming", but just basic life analysis.  

I wonder if count is used outside the for loop?  I think it's time to go
back to the flow and lreg dumps and see what information they give about
pseudo 32.


  > So yes, this code benefits considerably from live range splitting.  I
  > am not sure about the use-count issue, I'd have to stare a lot more.
  > Another place this code could win is if local register allocation
  > could handle a block like this itself:
  > 
  > {
  >   int x = read(...)
  >   if (x < 0)
  >     goto error;
  >   else if (x == 0)
  >     goto eof;
  >  
  >    /* do stuff with x here */
  > }
  > 
  > I think this is what you meant by `extended basic blocks' ?  There are
  > a bunch of cases like this.
An extended basic block is a maximal list of instructions with only one
entry point.  That kind of code would be an extended basic block since there
aren't any joiner nodes.

There are a couple of transformations done by local-alloc which are not
appropriate for EBBs, but it should (in theory) be easy to have those
transformations only apply within a BB (some of the copy opts in particular).

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  8:35           ` Zack Weinberg
  1999-01-27  9:08             ` Joern Rennecke
@ 1999-01-28  8:11             ` Jeffrey A Law
  1999-01-28 11:40               ` Zack Weinberg
  1999-01-31 23:58               ` Jeffrey A Law
  1999-01-31 23:58             ` Zack Weinberg
  2 siblings, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-28  8:11 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901271634.LAA20086@blastula.phys.columbia.edu >you write:
  > On Tue, 26 Jan 1999 15:44:06 +0000 (GMT), Joern Rennecke wrote:
  > 
  > >> C:
  > >> 	buf = xrealloc (buf, op - buf);
  > >> 	fp->buf = buf;
  > >> 	return op - buf;
  > >> 
  > >> becomes
  > [...]
  > 
  > >The first problem here is that (op - buf) hasn't been subjected to cse.
  > >Is the address of buf taken anywhere in this function?
  > 
  > No, neither buf nor op has its address taken.
  > 
  > Right before CSE, we have this RTL for the above code.  CSE makes no
  > changes except to annotate insns with patterns.  Flow thinks this is
  > one basic block, and that reg 86 is dead after insn 608.
  > 
  > Local register allocation says reg 86 is bb-local but doesn't give it
  > a hard register.  Combine collapses reg 87 into reg 28.
There is no common subespression here because "buf" is changed (it's the return
value of the xrealloc call).

Thus, the value computed into pseudo 86 by insn 606 is not necessarily the same
as the value computed into pseudo 87 by insn 620.

jeff


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28  8:11             ` Jeffrey A Law
@ 1999-01-28 11:40               ` Zack Weinberg
  1999-01-31 23:58               ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-28 11:40 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Thu, 28 Jan 1999 09:02:45 -0700, Jeffrey A Law wrote:
>
>  In message < 199901271634.LAA20086@blastula.phys.columbia.edu >you write:
>  > On Tue, 26 Jan 1999 15:44:06 +0000 (GMT), Joern Rennecke wrote:
>  > 
>  > >> C:
>  > >> 	buf = xrealloc (buf, op - buf);
>  > >> 	fp->buf = buf;
>  > >> 	return op - buf;
>  > >> 
>  > >> becomes
>  > [...]
[...]
>There is no common subespression here because "buf" is changed (it's
>the return value of the xrealloc call).
>
>Thus, the value computed into pseudo 86 by insn 606 is not
>necessarily the same as the value computed into pseudo 87 by insn
>620.

Duh.  You're right, and in fact that's a bug.  Thank you.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28  5:26                             ` Jeffrey A Law
@ 1999-01-28 17:20                               ` Zack Weinberg
  1999-01-28 17:33                                 ` Joern Rennecke
  1999-01-28 18:01                                 ` Jeffrey A Law
  1999-01-31 23:58                               ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-28 17:20 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Thu, 28 Jan 1999 06:22:55 -0700, Jeffrey A Law wrote:
>
>  In message < 199901271725.MAA20390@blastula.phys.columbia.edu >you write:
>  > Nope.  No spills of reg 2 anywhere.
>Hmmm.  Odd.  Does reg 2 show up in the conflict lists?  Otherwise I can't
>think of a reason why it would not be used..

I can't reproduce this exact situation anymore, but it seems to want
to use reg 2 (and sometimes reg 1 also) for scratch purposes
exclusively.

>  > Pseudo 27 was the input buffer pointer, which deserved a permanent
>  > register.  Pseudo 32 was `count', a variable used heavily in the outer
>  > loop but not anywhere in the inner loop - code like this:
>  > 
>  > for (;;)
>  > {
>  >   count = read (...);
>  >   if (count < 0) goto error;
>  >   else if (count == 0) break;
>  >   offset += count;
>  >   ip = ibase;
>  >   ip[count] = '\0';
>  >   if (offset > limit) { /* grow output buffer */ }
>  > 
>  >   while ((c = *ip++) != '\0')
>  >     switch (c)
>  >     {
>  >       /* count not used anywhere in here */
>  >     }
>  > }
>Ah.  I wonder if global is considering pseudo 32 live across the inner loop.
>As far as I can tell it isn't really live across the inner loop and isn't
>used in the inner loop.  In theory, we should be able to re-use its hard reg
>for a different pseudo inside the loop.
>
>Something else to investigate.  Why does psuedo 32 conflict with pseudo 30.

Flow has the correct live range for pseudo 32, and the counts in lreg
are sane.  It now seems to get it right even when the variable is
declared with function scope.

The problem may have been with some atrociously tangled EOF handling
code which I have now thrown away.  There was another variable
trivially derived from `count' which was used in the inner loop. Some
pass (loop?) may have collapsed the two variables into one.

Now here's an interesting thing.  The inner loop was originally coded

for(;;)
{
    unsigned char c;
    c = *ip++;
    switch(c)
    {
	default:
	    *op++ = c;
	    break;
	/* more cases here */
    }
}

The top of the loop produced RTL like this:

(insn 166 162 167 (set (reg/v:QI 44)
        (mem:QI (reg/v:SI 27) 0)) -1 (nil)
    (nil))

(insn 167 166 169 (set (reg/v:SI 27)
        (plus:SI (reg/v:SI 27)
            (const_int 1))) -1 (nil)
    (nil))

(note 169 167 476 "" NOTE_INSN_DELETED)

(insn 476 169 477 (set (reg:SI 76)
        (zero_extend:SI (reg/v:QI 44))) -1 (nil)
    (nil))

(insn 477 476 478 (set (cc0)
        (compare (reg:SI 76)
            (const_int 10))) -1 (nil)
    (nil))
;; switch continues...

No pass was able to collapse regs 44 and 76 together, and we'd end up
with assembly output like so:

	movb (%esi),%cl
	incl %esi
	xorl %eax,%eax
	movb %cl,%al
	cmpl $10,%eax

Frequently %cl would be spilled out to a stack slot too.

If I instead use an unsigned int variable, then I get this rtl:

(insn 142 139 143 (set (reg/v:SI 38)
        (zero_extend:SI (mem:QI (reg/v:SI 27) 0))) -1 (nil)
    (nil))

(insn 143 142 145 (set (reg/v:SI 27)
        (plus:SI (reg/v:SI 27)
            (const_int 1))) -1 (nil)
    (nil))

(note 145 143 424 "" NOTE_INSN_DELETED)

(insn 424 145 425 (set (cc0)
        (compare (reg/v:SI 38)
            (const_int 10))) -1 (nil)
    (nil))
;; switch continues...

which winds up in the much more sensible

	xorl %eax,%eax
	movb (%esi),%al
	incl %esi
	cmpl $10,%eax
	...

Furthermore, it reliably keeps c in a register, and reliably clobbers
it when it's dead.  (It sometimes decides to leave the pointer on the
stack, but doing manual LRS on c fixes that.)  The previously
problematic

case '?':
  if (opts->trigraphs || opts->warn_trigraphs)
  {
    unsigned char d;
    /* stuff not involving c */
  }
  else
    *op++ = c;

is also handled sanely: gcc is aware that c is dead if the if
succeeds, and in fact it remembers what value is in c and reuses its
register for the conditional test.

Is it possible that the handling of QImode variables is not very
intelligent?

---

Another problem which could be related to Marc's code-size issues.
The stack frame generated for this function has a ~4K buffer and a
bunch of spilled pseudos.  The frame is laid out with the buffer
nearer to the frame pointer than the spills, so all the stack slot
offsets are large (range 4100-4150) and the assembler is forced to use
32bit displacements.  If the spills were put next to the frame
pointer, the assembler could use 8bit displacements.  I can simulate
this by using alloca to get the buffer; the code is almost identical,
but all the displacements are in the +-128 range and the object code
shrinks by 150 bytes.  (It does get it right with -fomit-frame-pointer
on.)

Using alloca is not an ideal workaround, because gcc insists on giving
the buffer's base a stack slot when it could perfectly well use the
machine stack pointer.  This would take some cleverness, but the logic
is the same that's needed for -fomit-frame-pointer.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28 17:20                               ` Zack Weinberg
@ 1999-01-28 17:33                                 ` Joern Rennecke
  1999-01-28 18:01                                 ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-28 17:33 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: law, egcs

> The top of the loop produced RTL like this:
> 
> (insn 166 162 167 (set (reg/v:QI 44)
>         (mem:QI (reg/v:SI 27) 0)) -1 (nil)
>     (nil))
> 
> (insn 167 166 169 (set (reg/v:SI 27)
>         (plus:SI (reg/v:SI 27)
>             (const_int 1))) -1 (nil)
>     (nil))
> 
> (note 169 167 476 "" NOTE_INSN_DELETED)
> 
> (insn 476 169 477 (set (reg:SI 76)
>         (zero_extend:SI (reg/v:QI 44))) -1 (nil)
>     (nil))
> 
> (insn 477 476 478 (set (cc0)
>         (compare (reg:SI 76)
>             (const_int 10))) -1 (nil)
>     (nil))
> ;; switch continues...
> 
> No pass was able to collapse regs 44 and 76 together, and we'd end up
> with assembly output like so:

optimize_reg_copy_3 in regmove.c has code to handle such zero / sign
extensions, but this works only if the source register is set but once.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28 17:20                               ` Zack Weinberg
  1999-01-28 17:33                                 ` Joern Rennecke
@ 1999-01-28 18:01                                 ` Jeffrey A Law
  1999-01-28 18:27                                   ` Zack Weinberg
  1 sibling, 1 reply; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-28 18:01 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901290119.UAA29185@blastula.phys.columbia.edu >you write:
  > On Thu, 28 Jan 1999 06:22:55 -0700, Jeffrey A Law wrote:
  > >
  > >  In message < 199901271725.MAA20390@blastula.phys.columbia.edu >you write:
  > >  > Nope.  No spills of reg 2 anywhere.
  > >Hmmm.  Odd.  Does reg 2 show up in the conflict lists?  Otherwise I can't
  > >think of a reason why it would not be used..
  > 
  > I can't reproduce this exact situation anymore, but it seems to want
  > to use reg 2 (and sometimes reg 1 also) for scratch purposes
  > exclusively.
Definitely something not working as expected.  I have vague memories of the
allocators not wanting to take the last register to avoid some pathological
problem in reload.  But I can't find evidence of that code anymore
(this is separate from the CLASS_LIKELY_SPILLED stuff in local-alloc.c).

If this shows up again, we definitely want to dive deeper into it.


  > Flow has the correct live range for pseudo 32, and the counts in lreg
  > are sane.  It now seems to get it right even when the variable is
  > declared with function scope.
Very odd.  So are you saying that pseudos 30 & 32 no longer conflict?  And
as such can share a reg?


  > The problem may have been with some atrociously tangled EOF handling
  > code which I have now thrown away.  There was another variable
  > trivially derived from `count' which was used in the inner loop. Some
  > pass (loop?) may have collapsed the two variables into one.
Quite possible.  Could be one of a number of passes.  cprop, regmove, local
and global all have some capabilities to try and tie registers together.

That's an interesting thought -- what kind of heuristics would be useful 
(particularly in local-alloc.c) to guess when tieing regs together is going
to lose...

  > 
  > Now here's an interesting thing.  The inner loop was originally coded
  > 
  > for(;;)
  > {
  >     unsigned char c;
  >     c = *ip++;
  >     switch(c)
  >     {
  > 	default:
  > 	    *op++ = c;
  > 	    break;
  > 	/* more cases here */
  >     }
  > }
  > 
  > The top of the loop produced RTL like this:
  > 
  > (insn 166 162 167 (set (reg/v:QI 44)
  >         (mem:QI (reg/v:SI 27) 0)) -1 (nil)
  >     (nil))
  > 
  > (insn 167 166 169 (set (reg/v:SI 27)
  >         (plus:SI (reg/v:SI 27)
  >             (const_int 1))) -1 (nil)
  >     (nil))
  > 
  > (note 169 167 476 "" NOTE_INSN_DELETED)
  > 
  > (insn 476 169 477 (set (reg:SI 76)
  >         (zero_extend:SI (reg/v:QI 44))) -1 (nil)
  >     (nil))
  > 
  > (insn 477 476 478 (set (cc0)
  >         (compare (reg:SI 76)
  >             (const_int 10))) -1 (nil)
  >     (nil))
  > ;; switch continues...
  > 
  > No pass was able to collapse regs 44 and 76 together, and we'd end up
  > with assembly output like so:
Well, I don't know what dump this came from, but I don't see a REG_DEAD note
for reg 44 on insn 477.  Without the REG_DEAD note the combination
opportunities are much more limited.

--


  > Another problem which could be related to Marc's code-size issues.
  > The stack frame generated for this function has a ~4K buffer and a
  > bunch of spilled pseudos.  The frame is laid out with the buffer
  > nearer to the frame pointer than the spills, so all the stack slot
  > offsets are large (range 4100-4150) and the assembler is forced to use
  > 32bit displacements.  If the spills were put next to the frame
  > pointer, the assembler could use 8bit displacements.  I can simulate
  > this by using alloca to get the buffer; the code is almost identical,
  > but all the displacements are in the +-128 range and the object code
  > shrinks by 150 bytes.  (It does get it right with -fomit-frame-pointer
  > on.)
  > 
  > Using alloca is not an ideal workaround, because gcc insists on giving
  > the buffer's base a stack slot when it could perfectly well use the
  > machine stack pointer.  This would take some cleverness, but the logic
  > is the same that's needed for -fomit-frame-pointer.
This kind of optimization is nontrivial.  Particularly when you have to work
on targets where the validity of an address may depend on those offsets.  So,
you move something, it's offset is suddenly too big.  You need to reload the
address, blam, you've got to move something else...  

Even for a target like the x86 where the displacements are all valid, it's
not an easy problem.  In fact, it's register allocation all over again, but
with a different metric.  You want to pack the most heavily used stack slots
(which could be spills, locals, temporaries, etc) into the smallest offsets,
and less used things further away...

jeff
  > 
  > zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28 18:01                                 ` Jeffrey A Law
@ 1999-01-28 18:27                                   ` Zack Weinberg
  1999-01-28 19:58                                     ` Jeffrey A Law
  0 siblings, 1 reply; 54+ messages in thread
From: Zack Weinberg @ 1999-01-28 18:27 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Thu, 28 Jan 1999 18:57:03 -0700, Jeffrey A Law wrote:
>
>  In message < 199901290119.UAA29185@blastula.phys.columbia.edu >you write:
>  > On Thu, 28 Jan 1999 06:22:55 -0700, Jeffrey A Law wrote:
>  > >
>  > >  In message < 199901271725.MAA20390@blastula.phys.columbia.edu >you write:
>  > >  > Nope.  No spills of reg 2 anywhere.
>  > >Hmmm.  Odd.  Does reg 2 show up in the conflict lists?  Otherwise I can't
>  > >think of a reason why it would not be used..
>  > 
>  > I can't reproduce this exact situation anymore, but it seems to want
>  > to use reg 2 (and sometimes reg 1 also) for scratch purposes
>  > exclusively.
>Definitely something not working as expected.  I have vague memories of the
>allocators not wanting to take the last register to avoid some pathological
>problem in reload.  But I can't find evidence of that code anymore
>(this is separate from the CLASS_LIKELY_SPILLED stuff in local-alloc.c).
>
>If this shows up again, we definitely want to dive deeper into it.

Well, here's something like it:

;; Register dispositions:
27 in 5  28 in 3  33 in 0  36 in 3  37 in 4  38 in 0  
40 in 0  43 in 0  50 in 4  57 in 0  69 in 0  73 in 0  
79 in 0  80 in 0  85 in 0  90 in 0  92 in 3  94 in 0  

;; Hard regs used:  0 1 3 4 5 6 7 16

The source that got this dump is rather different from the source I sent
this morning, in particular register numbering is totally shuffled.  I
don't know why it wants to use hard reg 0 for so much stuff.  The
final assembly is decent, but it doesn't use register C (==2) at all.

Where do I start diving?

>  > Flow has the correct live range for pseudo 32, and the counts in lreg
>  > are sane.  It now seems to get it right even when the variable is
>  > declared with function scope.
>Very odd.  So are you saying that pseudos 30 & 32 no longer conflict?  And
>as such can share a reg?

Pseudo 30 no longer exists.  Pseudo 32 is now pseudo 33 and gets put
in %eax with a whole pile of other stuff.

>  > The problem may have been with some atrociously tangled EOF handling
>  > code which I have now thrown away.  There was another variable
>  > trivially derived from `count' which was used in the inner loop. Some
>  > pass (loop?) may have collapsed the two variables into one.
>Quite possible.  Could be one of a number of passes.  cprop, regmove, local
>and global all have some capabilities to try and tie registers together.
>
>That's an interesting thought -- what kind of heuristics would be useful 
>(particularly in local-alloc.c) to guess when tieing regs together is going
>to lose...

It was a lose here because it caused a register spill in the common
case - but I don't know if gcc could've known it was the common case.
Profile-directed feedback anyone?

>  > (insn 166 162 167 (set (reg/v:QI 44)
>  >         (mem:QI (reg/v:SI 27) 0)) -1 (nil)
>  >     (nil))
>  > 
>  > (insn 167 166 169 (set (reg/v:SI 27)
>  >         (plus:SI (reg/v:SI 27)
>  >             (const_int 1))) -1 (nil)
>  >     (nil))
>  > 
>  > (note 169 167 476 "" NOTE_INSN_DELETED)
>  > 
>  > (insn 476 169 477 (set (reg:SI 76)
>  >         (zero_extend:SI (reg/v:QI 44))) -1 (nil)
>  >     (nil))
>  > 
>  > (insn 477 476 478 (set (cc0)
>  >         (compare (reg:SI 76)
>  >             (const_int 10))) -1 (nil)
>  >     (nil))
>  > ;; switch continues...
>  > 
>  > No pass was able to collapse regs 44 and 76 together, and we'd end up
>  > with assembly output like so:
>Well, I don't know what dump this came from, but I don't see a REG_DEAD note
>for reg 44 on insn 477.  Without the REG_DEAD note the combination
>opportunities are much more limited.

The dump is from initial rtl generation, but reg 44 isn't dead at this
point in the code.  However, on x86, it's dirt cheap to rewrite the
above to (set (reg:SI 76) (zero_extend:SI (mem:QI (reg:SI 27))) and
then replace (reg:QI 44) with (subreg:QI (reg:SI 76)) wherever it
appears.  We don't do that.  It would be a big win: it reduces
register pressure (always good on x86) and it would enable
optimizations based on known contents of reg 76.  This particular
function benefits from both.

>  > Another problem which could be related to Marc's code-size issues.
>  > The stack frame generated for this function has a ~4K buffer and a
>  > bunch of spilled pseudos.  The frame is laid out with the buffer
>  > nearer to the frame pointer than the spills, so all the stack slot
>  > offsets are large (range 4100-4150) and the assembler is forced to use
>  > 32bit displacements.  If the spills were put next to the frame
>  > pointer, the assembler could use 8bit displacements.
[...]
>This kind of optimization is nontrivial.  Particularly when you have to work
>on targets where the validity of an address may depend on those offsets.  So,
>you move something, it's offset is suddenly too big.  You need to reload the
>address, blam, you've got to move something else...  
>
>Even for a target like the x86 where the displacements are all valid, it's
>not an easy problem.  In fact, it's register allocation all over again, but
>with a different metric.  You want to pack the most heavily used stack slots
>(which could be spills, locals, temporaries, etc) into the smallest offsets,
>and less used things further away...

For starters, we seem to leave space for stack slots that are never
used; that should be easy to fix, no?

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28 18:27                                   ` Zack Weinberg
@ 1999-01-28 19:58                                     ` Jeffrey A Law
  0 siblings, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-28 19:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901290227.VAA29500@blastula.phys.columbia.edu >you write:
  > ;; Register dispositions:
  > 27 in 5  28 in 3  33 in 0  36 in 3  37 in 4  38 in 0  
  > 40 in 0  43 in 0  50 in 4  57 in 0  69 in 0  73 in 0  
  > 79 in 0  80 in 0  85 in 0  90 in 0  92 in 3  94 in 0  
  > 
  > ;; Hard regs used:  0 1 3 4 5 6 7 16
  > 
  > The source that got this dump is rather different from the source I sent
  > this morning, in particular register numbering is totally shuffled.  I
  > don't know why it wants to use hard reg 0 for so much stuff.  The
  > final assembly is decent, but it doesn't use register C (==2) at all.
Well, %eax is used most often because it's the first one listed in
REG_ALLOC_ORDER.  It's also automatically preferred for anything that is a
return value, or copied from a return value.

  > Where do I start diving?
Much like we did last time.

Find the first reg in the list of regs that are candidates for global
allocation.  Verify that #uses > 4 * # calls crossed.  Then put a breaking
in global.c:find_reg and step through it to find out why it's not getting
a hard reg.

  > It was a lose here because it caused a register spill in the common
  > case - but I don't know if gcc could've known it was the common case.
  > Profile-directed feedback anyone?
Well, we've got a few things that can be examined to build some heuristics
around.

We've got conflict lists.  If tying expands the conflict list, then that's
not good.  The question is how much expansion is too much expansion.

We've got lots of counters.

One could envision a heuristic like "the allocation priority of the tied
regs must be within some delta of the maximum of the alloation priorities of
the individual regs".  And the delta would inversely related to the expansion
of the conflict list.

Just a thought.  I've never tried anything like it.


  > The dump is from initial rtl generation, but reg 44 isn't dead at this
  > point in the code.  However, on x86, it's dirt cheap to rewrite the
  > above to (set (reg:SI 76) (zero_extend:SI (mem:QI (reg:SI 27))) and
  > then replace (reg:QI 44) with (subreg:QI (reg:SI 76)) wherever it
  > appears.  We don't do that.  It would be a big win: it reduces
  > register pressure (always good on x86) and it would enable
  > optimizations based on known contents of reg 76.  This particular
  > function benefits from both.
There's definite inconsistencies in our usage of zero-extend vs subregs on the
x86.  We were never able to reach any kind of conclusion about which one was
better to bias towards.

  > For starters, we seem to leave space for stack slots that are never
  > used; that should be easy to fix, no?
Depends on why they're unused.  Never assume anything is easy until you find
out why it's happening.

For example, maybe they're unused because reload inheritance eliminated the
need to ever store a value into those slots.  By the time we make that kind of
determination it's too late to eliminate the slot.  Similarly for reload_cse
eliminating the need for some slots.

Jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 21:38                       ` Zack Weinberg
  1999-01-26  4:59                         ` Jeffrey A Law
@ 1999-01-31 23:58                         ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 22:27:06 -0700, Jeffrey A Law wrote:
>
>
>
>  > ;; 28 regs to allocate: 64 75 46 49 32 28 44 79 85 94 27 57 30 ...
>  > 
>  > ;; 30 conflicts: 22 24 25 26 27 28 29 30 31 32 33 34 40 42 43 44 
>  > 		 46 49 52 57 58 64 70 75 92 93 94 0 7
>Well, there's not likely to be any register available for 30.  It conflicts
>with 10 other registers that have a higher priority.  For a machine with 6
>user regs the chances are slim.  The only hope is that some of those regs
>in the conflict list can be allocated into a the same hard register.

Right.

>
>  > Register 30 used 53 times across 167 insns; crosses 3 calls; 
>  > 	GENERAL_REGS or none; pointer.
>  > 
>  > The odd thing is, the use count for reg 30 shouldn't be anywhere near
>  > that high.
>Use counts are weighted to give pseudos used in loops a better chance of being
>allocated into a register.

Ah, I see.

>  > It never says anything other than reg 0, 1, or 2, and it hits just
>  > about every insn in the function.
>Right.  We have to scan each insn in the function to see if any registers
>need to be spilled at that particular insn.

Those are hard register numbers, right?

>The next step is to find out what registers are used by the entries on the
>conflict list.  In the .greg file there should be something like this:
>
>;; Register dispositions:
>94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
>102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
>[ ... ]

23 in 0  27 in 4  28 in 3  32 in 1  40 in 0  42 in 3  
43 in 5  46 in 0  49 in 0  57 in 5  64 in 0  70 in 0  
75 in 0  79 in 0  80 in 0  85 in 0  94 in 0  

which is strange: hard register 2 (ecx) isn't used for anything.  What
would cause that?

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 21:18                   ` Zack Weinberg
  1999-01-25 21:30                     ` Jeffrey A Law
@ 1999-01-31 23:58                     ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 21:49:32 -0700, Jeffrey A Law wrote:
>
>  In message < 199901260441.XAA09930@blastula.phys.columbia.edu >you write:
>  > Right.  The reason I thought it was alias, is that it gets a hard
>  > register if I change the extern char array declaration to be const.
>Interesting.  But that may work because it allows optimizations earlier in
>the compiler, and for whatever reason we can find a hard reg for (reg 30).

The only other difference it seems to make is that the other vars in
hard regs got shuffled around a bit.

>  > Register 30 used 53 times across 167 insns; set 7 times; user var; 
>  > 	crosses 3 calls; GENERAL_REGS or none; pointer.
>  > 
>  > (It's referenced in the outer loop and in several special cases
>  > of the inner loop.)
>OK.  So at the top of the .greg dump you there should be a list of registers
>in priority order with a conflict matrix.  That should give you some idea
>of how it fits in with the rest of the registers that need to be allocated
>by global alloc.  Probably worth sending the regs to allocate list up to
>and including reg 30.  Probably not worth sending the whole conflict list yet.


;; 28 regs to allocate: 64 75 46 49 32 28 44 79 85 94 27 57 30 ...

;; 30 conflicts: 22 24 25 26 27 28 29 30 31 32 33 34 40 42 43 44 
		 46 49 52 57 58 64 70 75 92 93 94 0 7

Register 30 used 53 times across 167 insns; crosses 3 calls; 
	GENERAL_REGS or none; pointer.

The odd thing is, the use count for reg 30 shouldn't be anywhere near
that high.  It's used at the C level to reset the input buffer pointer
each time through the outer loop, and some code modifies it just
before jumping back to the beginning of the outer loop.  (GCC doesn't
seem to realize that it's going back to the beginning of the outer
loop - I could fix that with gotos, but it would be ugly.)

I'm much more interested in regs 44 and 75, actually. They're both the
same variable (one in QI, one in SI) which is critical in the inner
loop.  Both are ahead of reg 30; 75 gets a hard register and 44
doesn't.  Reg 44 is used for several different things, maybe I should
give them different C-level variables.

;; 44 conflicts: 22 24 25 26 27 28 29 30 31 33 34 44 52 57 58 64 75 92 93 94 7
;; 75 conflicts: 22 24 25 26 27 28 29 30 31 33 34 44 75 92 93 7

Register 44 used 33 times across 60 insns; dies in 4 places; crosses 1
	call; 1 bytes; pref Q_REGS, else GENERAL_REGS.

Register 75 used 18 times across 9 insns; dies in 2 places; 
	pref Q_REGS, else GENERAL_REGS.

>You also need to look in the .greg dump for messages like
>
>Spilling for insn 77.
>Spilling reg 1.
> Register 110 now in 31.
> Register 120 now in stack

There's a whole mess of 'em, but they are singularly uninformative.

Spilling for insn 4.
Spilling reg 0.
Spilling for insn 8.
Spilling reg 0.
Spilling for insn 10.
Spilling reg 0.
Spilling for insn 16.
Spilling reg 0.
...

It never says anything other than reg 0, 1, or 2, and it hits just
about every insn in the function.

There's no `Register X now in stack' message for any register.

>Probably the first thing to look for is "Register 30 now in stack" which
>would indicate that it got a register during global allocation, but the
>register it got was spilled and re-allocation failed.
>
>If there isn't any such message, then the reg 30 never got a hard register
>because it's priority was too low.
>
>Looking at the calls & uses info, if a caller-saved register is available, the
>compiler should use it (4 * #calls < #refs).

Are you looking at the same function as me?  (It's read_and_prescan
out of the cpplib patch I sent earlier this evening.)

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 20:41               ` Zack Weinberg
  1999-01-25 20:53                 ` Jeffrey A Law
@ 1999-01-31 23:58                 ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 21:30:10 -0700, Jeffrey A Law wrote:
>
>  In message < 199901260356.WAA09714@blastula.phys.columbia.edu >you write:
>  > On Mon, 25 Jan 1999 14:32:40 -0700, Jeffrey A Law wrote:
>  > >  > 	decl -4112(%ebp)
>  > >  > 	movl -4112(%ebp),%ecx
>  > >  > 
>  > >  > - if this were written the other way around, you'd avoid an address
>  > >  > generation, and a read-mod-write to memory; note that this particular
>  > >  > location is unlikely to be in L1.  Also the code would be smaller.
>  > >  > This looks like the problem Marc Espie keeps complaining about.
>  > >First, find out where and why we're getting this code.  Are we getting
>  > >reloads?  Look at the dump outputs, they're going to tell you more than
>  > >I ever could from looking at the assembly.
>  > 
>  > This one turns out to be alias analysis's fault.
>  > 
>  > C:
>  > 	/* begin basic block */
>  > 	*--ptr = '\n';
>  > 	break;
>  > 	/* end basic block */
>  > 
>  > where ptr is (ought to be) known to be pointing into a char array on
>  > the stack.
>Hmmm, it looks more like a spill than an alias problem.  ie, (reg 30) is not
>homed in a hard register, but is instead is homed in a stack slot.
>
>This is accomplished by changing (reg 30) to (mem (plus (fp) (offset))
>in every occurrence (BTW the expression (reg 30) is shared).
>
>The increment is still a valid insn after that transformation, so it's left
>alone.

Right.  The reason I thought it was alias, is that it gets a hard
register if I change the extern char array declaration to be const.

>Before looking for ways to optimize the reloaded code, we should look into
>the register assignments and reloading.
>
>ie, did (reg 30) initially get a hard register, but then get spilled because
>the hard register was needed elsewhere?  Or did (reg 30) simply not get a hard
>register because none was available.
>
>So, first find out if (reg 30) is local to a single basic block.

It's not. lreg says

Register 30 used 53 times across 167 insns; set 7 times; user var; 
	crosses 3 calls; GENERAL_REGS or none; pointer.

(It's referenced in the outer loop and in several special cases
of the inner loop.)

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-26  7:44         ` Joern Rennecke
  1999-01-27  8:35           ` Zack Weinberg
@ 1999-01-31 23:58           ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: law, amylaar, egcs

> There is questionable code elsewhere, such as:
> 
> 	decl -4112(%ebp)
> 	movl -4112(%ebp),%ecx

This is a CISC specific problem - load-store architectures don't see it since
they have to do a load first for the decrement, and then can inherit from the
store.
I think it could be tackled by something I'd call 'reverse inheritance' - you
remember when could have used the value in a register, and when you later
find a load of the same value, and the register is free from the desired use
to the load, the load can be moved upwards.
However, in some cases - like this one - this strategy increases code size
and issue bandwidth, so its use has to be conditionalized on ! optimize_size
and on compiling for a processor where simplicity of operations is more
important than the mere instruction count.

> C:
> 	buf = xrealloc (buf, op - buf);
> 	fp->buf = buf;
> 	return op - buf;
> 
> becomes
> 
> 	movl %ebx,%eax
> 	subl -4104(%ebp),%eax
> 	pushl %eax
> 	movl -4104(%ebp),%edx
> 	pushl %edx
> 	call xrealloc
> 	movl %eax,-4104(%ebp)
> 	movl 12(%ebp),%eax
> 	movl -4104(%ebp),%ecx
> 	movl %ecx,(%eax)
> 	subl %ecx,%ebx
> 	movl %ebx,%eax
> 	jmp epilogue

The first problem here is that (op - buf) hasn't been subjected to cse.
Is the address of buf taken anywhere in this function?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  9:52               ` Zack Weinberg
  1999-01-27 11:49                 ` Marc Espie
@ 1999-01-31 23:58                 ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: law, egcs

On Wed, 27 Jan 1999 17:07:37 +0000 (GMT), Joern Rennecke wrote:
>> Right before CSE, we have this RTL for the above code.  CSE makes no
>> changes except to annotate insns with patterns.  Flow thinks this is
>
>Well, why doesn't it use reg 86 in insn 620?

I've no idea.  I can do the CSE by hand, which produces better,
but still suboptimal assembler:

        subl -4104(%ebp),%ebx
        pushl %ebx
        movl -4104(%ebp),%ecx
        pushl %ecx
        call xrealloc
        movl %eax,-4104(%ebp)
        movl 12(%ebp),%eax
        movl -4104(%ebp),%edx
        movl %edx,(%eax)
        movl %ebx,%eax

Two issues here.  One is that after the call, %eax is spilled and
restored to %edx, so we can load the pointer at 12(%ebp) into %eax.
It's willing to use %edx here, so why not just load the pointer in
there?  Then we could generate

	movl 12(%ebp),%edx
	movl %eax,(%edx)
	movl %ebx,%eax

This might be lreg's fault.  Here's the RTL after lreg:

(insn 618 616 676 (set (reg/v:SI 26)
        (reg:SI 0 %eax)) 54 {movsi+2} (insn_list 616 (nil))
    (expr_list:REG_DEAD (reg:SI 0 %eax)
        (nil)))

(insn 676 618 621 (set (reg/v:SI 23)
        (mem:SI (plus:SI (reg:SI 16 %argp)
                (const_int 4)) 2)) 54 {movsi+2} (nil)
    (expr_list:REG_EQUIV (mem/f:SI (plus:SI (reg:SI 16 %argp)
                (const_int 4)) 2)
        (nil)))

(insn 621 676 624 (set (mem/s:SI (reg/v:SI 23) 5)
        (reg/v:SI 26)) 54 {movsi+2} (insn_list 618 (nil))
    (expr_list:REG_DEAD (reg/v:SI 26)
        (expr_list:REG_DEAD (reg/v:SI 23)
            (nil))))

(insn 624 621 626 (set (reg/i:SI 0 %eax)
        (reg/v:SI 86)) 54 {movsi+2} (nil)
    (expr_list:REG_DEAD (reg/v:SI 86)
        (nil)))

and lreg says of (reg/v:SI 23)
Register 23 used 2 times across 2 insns in block 58; set 1 time; 
	user var; GENERAL_REGS or none; pointer.

but it didn't give it a hard register...

The other issue is that before the call, we generate what looks to me
like silly assembler:

        subl -4104(%ebp),%ebx
        pushl %ebx
        movl -4104(%ebp),%ecx
        pushl %ecx

Which could be written

        movl -4104(%ebp),%ecx
        subl %ecx,%ebx
        pushl %ebx
        pushl %ecx

saving four bytes of machine code, and I'd think it's faster this way
too (but I'm no expert on Intel scheduling issues).

If I use -Os, it generates almost the assembly I suggested, but
there's a pile of other changes which invalidate the analysis: for
example, -4104(%ebp) is now in %edi apparently throughout the function.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-26  4:59                         ` Jeffrey A Law
  1999-01-27  9:26                           ` Zack Weinberg
@ 1999-01-31 23:58                           ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901260538.AAA10305@blastula.phys.columbia.edu >you write:
  > >Right.  We have to scan each insn in the function to see if any registers
  > >need to be spilled at that particular insn.
  > 
  > Those are hard register numbers, right?
Close enough.  It's a little more complicated than that.  But yes, we have
to find points where (for example) a pseudo got the wrong kind of hard
register.

We emit copy insns to get the value in/out of the register we need.

We must spill any value which allocated to the hard register we need for the
length of time that we need a particular hard register.  [ This is a recent
improvement, we used to spill *everything* that was allocated to the hard
register we need. ]

The act of spilling at one site may cause spilling elsewhere, so it's a
repeat until nothing changes algorithm.


  > >The next step is to find out what registers are used by the entries on the
  > >conflict list.  In the .greg file there should be something like this:
  > >
  > >;; Register dispositions:
  > >94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
  > >102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
  > >[ ... ]
  > 
  > 23 in 0  27 in 4  28 in 3  32 in 1  40 in 0  42 in 3  
  > 43 in 5  46 in 0  49 in 0  57 in 5  64 in 0  70 in 0  
  > 75 in 0  79 in 0  80 in 0  85 in 0  94 in 0  
  > 
  > which is strange: hard register 2 (ecx) isn't used for anything.  What
  > would cause that?
Check and see if there are any "Spilling reg 2." messages.  I bet you'll find
one for an insn over which pseudo 30 is live.


Anyway.

  Pseudo	Hard
    64		  0
    75		  0
    46		  0
    49		  0
    32		  1
    28		  3
    44		 None
    79		  0
    85		  0
    94		  0
    27		  4
    57		  5
    30		 None


As you mentioned, it might be worth doing a manual live range split so that
reg 44 can get allocated into a hard register rather than a stack slot.

It's interesting to note that pseudos 27 and 32 are the only registers
allocated to hard registers 4 and 1 respectively.  Presumably they have long
lifetimes (which makes them conflict with everything) and high usage counts
which gives them a high priority.

It would be interesting to know what user variables those pseudos correspond
to (or temporary expressions if they aren't a user variable).  They may be
candidates for splitting too.  It would also be interesting to relate them
back to the source to see if their weighted counts make any sense.

Consider a loop with a switch statement that is executed once every loop
iteration.  In different cases of the switch statement we have a use of a
pseudo.  Even though only one of the case statements will execute each
iteration of the loop, we might be summing their use counts.

Or more generally, I'm not sure if our weighted counting system looks at the
flow control within a loop at all.  Doing so may (or may not) be difficult.

That may (or may not) be important for this case.  I do remember a very old
(circa 1991) "bug" report about this.

The other thing to look at would be the accuracy of the live length count
since that's one of the other major components to compute priority.  I do not
believe we recompute that count after combine & regmove scramble insns.  Thus
they may be inaccurate.  Again, those inaccuracies may or may not be important
for this example.

I guess in summary, so far it looks like we don't have a register available,
but we need to make sure that the allocations we made before pseudo 30 make
sense.  ie, are those registers really used more often, or do we have a
problem with our counts?


jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 11:47   ` Zack Weinberg
  1999-01-25 11:50     ` Joern Rennecke
  1999-01-25 12:00     ` Jeffrey A Law
@ 1999-01-31 23:58     ` Zack Weinberg
  2 siblings, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: egcs

On Mon, 25 Jan 1999 12:55:37 +0000 (GMT), Joern Rennecke wrote:
>> then c gets to stay in a register all the time, and the code runs
>> about twice as fast.  It seems to me we should be generating identical
>> code for both cases.
>
>Reload inheritance follows only the fall-through path of a branch.
>This might be the reason for the differences you are seeing.

Mm, possibly.  Is there a reason for that?

The generated code is a little silly even when adjusted:

	movb	(%esi),%dl
	incl	%esi
	xorl	%eax, %eax
	movb	%dl, %eax
	[pick case branch based on contents of eax...]

We could just as well do it

	xorl	%eax, %eax
	movb	(%esi), %al
	incl	%esi
	[switch...]

and free up a register, which is no small thing on x86.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 11:50     ` Joern Rennecke
@ 1999-01-31 23:58       ` Joern Rennecke
  0 siblings, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

> >Reload inheritance follows only the fall-through path of a branch.
> >This might be the reason for the differences you are seeing.
> 
> Mm, possibly.  Is there a reason for that?

Yes.  Reload keeps track of values in registers as it scans the instructions
one after the other.  When a label is reached, it discards the information.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  9:08             ` Joern Rennecke
  1999-01-27  9:52               ` Zack Weinberg
@ 1999-01-31 23:58               ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: amylaar, law, egcs

> Right before CSE, we have this RTL for the above code.  CSE makes no
> changes except to annotate insns with patterns.  Flow thinks this is

Well, why doesn't it use reg 86 in insn 620?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 21:30                     ` Jeffrey A Law
  1999-01-25 21:38                       ` Zack Weinberg
@ 1999-01-31 23:58                       ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  > ;; 28 regs to allocate: 64 75 46 49 32 28 44 79 85 94 27 57 30 ...
  > 
  > ;; 30 conflicts: 22 24 25 26 27 28 29 30 31 32 33 34 40 42 43 44 
  > 		 46 49 52 57 58 64 70 75 92 93 94 0 7
Well, there's not likely to be any register available for 30.  It conflicts
with 10 other registers that have a higher priority.  For a machine with 6
user regs the chances are slim.  The only hope is that some of those regs
in the conflict list can be allocated into a the same hard register.


  > Register 30 used 53 times across 167 insns; crosses 3 calls; 
  > 	GENERAL_REGS or none; pointer.
  > 
  > The odd thing is, the use count for reg 30 shouldn't be anywhere near
  > that high.
Use counts are weighted to give pseudos used in loops a better chance of being
allocated into a register.

  > I'm much more interested in regs 44 and 75, actually. They're both the
  > same variable (one in QI, one in SI) which is critical in the inner
  > loop.  Both are ahead of reg 30; 75 gets a hard register and 44
  > doesn't.  Reg 44 is used for several different things, maybe I should
  > give them different C-level variables.
Quite likely.  We have some code to do live range splitting at Cygnus.  I
haven't tried to make a case to integrate it into egcs as it will be made
obsolete by David Miller's new allocator.

  > 
OK.

  > It never says anything other than reg 0, 1, or 2, and it hits just
  > about every insn in the function.
Right.  We have to scan each insn in the function to see if any registers
need to be spilled at that particular insn.

  > There's no `Register X now in stack' message for any register.
OK.  That means the spills didn't force anything into the stack.  So (reg 30)
never got a hard register to begin with.  Given its position in the list
and the conflict matrix, I'm not suprised.

  > Are you looking at the same function as me?  (It's read_and_prescan
  > out of the cpplib patch I sent earlier this evening.)
I'm not looking at the C code at all, just the dump info you're providing.

The use/calls crossed counts you provided would make that pseudo a candidate
for caller-save optimizations.  However, there isn't a register available.

The next step is to find out what registers are used by the entries on the
conflict list.  In the .greg file there should be something like this:

;; Register dispositions:
94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
[ ... ]


Any register not listed is on the stack.

Find out what regs where used by conflicting pseudos with a higher priority.

I bet you'll find there isn't a free register that can be used for (reg 30).

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28  8:11             ` Jeffrey A Law
  1999-01-28 11:40               ` Zack Weinberg
@ 1999-01-31 23:58               ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901271634.LAA20086@blastula.phys.columbia.edu >you write:
  > On Tue, 26 Jan 1999 15:44:06 +0000 (GMT), Joern Rennecke wrote:
  > 
  > >> C:
  > >> 	buf = xrealloc (buf, op - buf);
  > >> 	fp->buf = buf;
  > >> 	return op - buf;
  > >> 
  > >> becomes
  > [...]
  > 
  > >The first problem here is that (op - buf) hasn't been subjected to cse.
  > >Is the address of buf taken anywhere in this function?
  > 
  > No, neither buf nor op has its address taken.
  > 
  > Right before CSE, we have this RTL for the above code.  CSE makes no
  > changes except to annotate insns with patterns.  Flow thinks this is
  > one basic block, and that reg 86 is dead after insn 608.
  > 
  > Local register allocation says reg 86 is bb-local but doesn't give it
  > a hard register.  Combine collapses reg 87 into reg 28.
There is no common subespression here because "buf" is changed (it's the return
value of the xrealloc call).

Thus, the value computed into pseudo 86 by insn 606 is not necessarily the same
as the value computed into pseudo 87 by insn 620.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25  4:56 ` Joern Rennecke
  1999-01-25 11:47   ` Zack Weinberg
@ 1999-01-31 23:58   ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

> then c gets to stay in a register all the time, and the code runs
> about twice as fast.  It seems to me we should be generating identical
> code for both cases.

Reload inheritance follows only the fall-through path of a branch.
This might be the reason for the differences you are seeing.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  8:35           ` Zack Weinberg
  1999-01-27  9:08             ` Joern Rennecke
  1999-01-28  8:11             ` Jeffrey A Law
@ 1999-01-31 23:58             ` Zack Weinberg
  2 siblings, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: law, egcs

On Tue, 26 Jan 1999 15:44:06 +0000 (GMT), Joern Rennecke wrote:

>> C:
>> 	buf = xrealloc (buf, op - buf);
>> 	fp->buf = buf;
>> 	return op - buf;
>> 
>> becomes
[...]

>The first problem here is that (op - buf) hasn't been subjected to cse.
>Is the address of buf taken anywhere in this function?

No, neither buf nor op has its address taken.

Right before CSE, we have this RTL for the above code.  CSE makes no
changes except to annotate insns with patterns.  Flow thinks this is
one basic block, and that reg 86 is dead after insn 608.

Local register allocation says reg 86 is bb-local but doesn't give it
a hard register.  Combine collapses reg 87 into reg 28.

(code_label 603 601 606 47 "")

(insn 606 603 608 (set (reg:SI 86)
        (minus:SI (reg/v:SI 28)
            (reg/v:SI 26))) -1 (nil)
    (nil))

(insn 608 606 610 (set (mem:SI (pre_dec:SI (reg:SI 7 %esp)) 0)
        (reg:SI 86)) -1 (nil)
    (nil))

(insn 610 608 612 (set (mem:SI (pre_dec:SI (reg:SI 7 %esp)) 0)
        (reg/v:SI 26)) -1 (nil)
    (nil))

(call_insn 612 610 614 (set (reg:SI 0 %eax)
        (call (mem:QI (symbol_ref:SI ("xrealloc")) 0)
            (const_int 8))) -1 (nil)
    (nil)
    (nil))

(insn 614 612 617 (set (reg/v:SI 26)
        (reg:SI 0 %eax)) -1 (nil)
    (nil))

(insn 617 614 620 (set (mem/s:SI (reg/v:SI 23) 5)
        (reg/v:SI 26)) -1 (nil)
    (nil))

(insn 620 617 622 (set (reg:SI 87)
        (minus:SI (reg/v:SI 28)
            (reg/v:SI 26))) -1 (nil)
    (nil))

(insn 622 620 624 (set (reg/i:SI 0 %eax)
        (reg:SI 87)) -1 (nil)
    (nil))

(jump_insn 624 622 625 (set (pc)
        (label_ref 654)) -1 (nil)
    (nil))

(barrier 625 624 626)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 13:31       ` Zack Weinberg
  1999-01-25 13:36         ` Jeffrey A Law
  1999-01-26  7:44         ` Joern Rennecke
@ 1999-01-31 23:58         ` Zack Weinberg
  2 siblings, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 12:56:16 -0700, Jeffrey A Law wrote:
>
>  In message < 199901251946.OAA06903@blastula.phys.columbia.edu >you write:
>
>  > The generated code is a little silly even when adjusted:
[...]
>The second sequence may actually be slower though.  The use of %eax and %al
>may trigger an interlock.  Someone would need to check.
>
>If we spilled, then we're probably going to lose regardless.  That's why
>I'd prefer to see us work on avoid the spills to start with by tightening
>up the x86 machine description.

The code is actually much worse than I thought, even when tweaked.

Input:

unsigned char *ip, *op;
for(;;)
{
    unsigned char c;
    c = *ip++;
    switch(c)
    {
	default: *op++ = c; break;
	case '\0': goto eof;
	case '\r': /* ... */ break;
	case '\n': /* ... */ break;
	case '?':  /* ... */ break;
    }
}
eof:

Current snapshot, -O2:

switch:
	movb (%esi),%al
	movb %al,-4125(%ebp)
	incl %esi
	xorl %eax,%eax
	movb -4125(%ebp),%al
	cmpl $10,%eax
	je case_one
	jg case_two_or_three
	testl %eax,%eax
	je case_four
	jmp default
	.p2align 4,,7
case_two_or_three:
	cmpl $13,%eax
	je case_two
	cmpl $63,%eax
	je case_three
default:
	movb -4125(%ebp),%dl
	movb %dl,(%ebx)
	incl %ebx
	jmp switch
	.p2align 4,,7
...

-4125(%ebp) will be in cache, of course, but this is still pretty
disgusting.

There is questionable code elsewhere, such as:

	decl -4112(%ebp)
	movl -4112(%ebp),%ecx

- if this were written the other way around, you'd avoid an address
generation, and a read-mod-write to memory; note that this particular
location is unlikely to be in L1.  Also the code would be smaller.
This looks like the problem Marc Espie keeps complaining about.

.L331:
	cmpb $63,-4125(%ebp)
	jne .L349
	movb 1(%esi),%cl
	movb %cl,-4125(%ebp)
	testb %cl,%cl
	jne .L333
	cmpl $0,-4120(%ebp)
	jne .L333
	decl -4112(%ebp)
	movl -4112(%ebp),%eax
	movb $63,(%eax)
	decl %eax
	movl %eax,-4112(%ebp)
	movb $63,(%eax)
	jmp .L311
	.p2align 4,,7
.L333:
	movzbl -4125(%ebp),%edi
	cmpb $0,trigraph_table(%edi)
	jne .L334

Here the store to -4125(%ebp) would be dead if it weren't used in the
movzbl at .L333.  I guess this is what Joern said about reloads not
inheriting down branches taken.

C:
	buf = xrealloc (buf, op - buf);
	fp->buf = buf;
	return op - buf;

becomes

	movl %ebx,%eax
	subl -4104(%ebp),%eax
	pushl %eax
	movl -4104(%ebp),%edx
	pushl %edx
	call xrealloc
	movl %eax,-4104(%ebp)
	movl 12(%ebp),%eax
	movl -4104(%ebp),%ecx
	movl %ecx,(%eax)
	subl %ecx,%ebx
	movl %ebx,%eax
	jmp epilogue

This could have been done

	movl -4104(%ebp), %eax
	subl %eax, %ebx
	pushl %ebx
	pushl %eax
	call xrealloc
	movl 12(%ebp), %ecx
	movl %eax, (%ecx)
	movl %ebx, %eax
	jmp epilogue

I think the key issue here is that it doesn't seem to know to preserve
values in call-saved registers over a function call.  It's somewhat
silly to be microoptimizing around a call to xrealloc, but this sort
of code is all over the place.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 13:14       ` John S. Dyson
@ 1999-01-31 23:58         ` John S. Dyson
  0 siblings, 0 replies; 54+ messages in thread
From: John S. Dyson @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: zack, amylaar, egcs

Jeffrey A Law said:
>   > 
>   > 	xorl	%eax, %eax
>   > 	movb	(%esi), %al
>   > 	incl	%esi
>   > 	[switch...]
>   > 
>   > and free up a register, which is no small thing on x86.
> The second sequence may actually be slower though.  The use of %eax and %al
> may trigger an interlock.  Someone would need to check.
> 
There are special hacks in the P6 that make the above sequence
work well.  That used to be the "correct" idiom on P5 and earlier
(because movzbl was slow.)  Since that was so common, the P6 does
execute the above code very well.

The only thing wrong with the above sequence (taken alone) is that
there might still be a stall from before the xorl.  That case doesn't
happen very often, so xorl %reg,%reg is still mostly better than
movl $0, %reg.  (The xorl doesn't know that the value after it's
operation is invariant based upon it's previous value, while the
movl does know.)  The movl is very big (the constant is a full
32 bits long), so is generally less desirable.

-- 
John                  | Never try to teach a pig to sing,
dyson@iquest.net      | it makes one look stupid
jdyson@nc.com         | and it irritates the pig.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-28  5:26                             ` Jeffrey A Law
  1999-01-28 17:20                               ` Zack Weinberg
@ 1999-01-31 23:58                               ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901271725.MAA20390@blastula.phys.columbia.edu >you write:
  > Nope.  No spills of reg 2 anywhere.
Hmmm.  Odd.  Does reg 2 show up in the conflict lists?  Otherwise I can't
think of a reason why it would not be used..

It's a caller-save register, so it should only be used for either values
which are not live across calls, or which have high use/#calls crossed
counts.

I think we already determined that pseudo 30 had a high enough ratio to use
a caller saved reg, so I'm rather suprised that pseudo 30 wasn't allocated
into hard reg 2.

It may be worth a breakpoint in global.c::find_reg to see what's going on.

You'll have to map from an allocno back to the pseudo.  Probably the easiest
way to do that is to look at reg_allocno[30].

I guess it is possible the allocno's use/#calls cross ratio isn't high enough
if multiple registers are assigned the same allocno.  Though I'm not sure if
global does that or not.

Anyway, once you get control in find_reg for the right allocno it should be
possible to determine why we didn't assign reg 30 to %ecx and emit save/restore
code around call sites.


  > Pseudo 27 was the input buffer pointer, which deserved a permanent
  > register.  Pseudo 32 was `count', a variable used heavily in the outer
  > loop but not anywhere in the inner loop - code like this:
  > 
  > for (;;)
  > {
  >   count = read (...);
  >   if (count < 0) goto error;
  >   else if (count == 0) break;
  >   offset += count;
  >   ip = ibase;
  >   ip[count] = '\0';
  >   if (offset > limit) { /* grow output buffer */ }
  > 
  >   while ((c = *ip++) != '\0')
  >     switch (c)
  >     {
  >       /* count not used anywhere in here */
  >     }
  > }
Ah.  I wonder if global is considering pseudo 32 live across the inner loop.
As far as I can tell it isn't really live across the inner loop and isn't
used in the inner loop.  In theory, we should be able to re-use its hard reg
for a different pseudo inside the loop.

Something else to investigate.  Why does psuedo 32 conflict with pseudo 30.

  > I put a block around the real useful range of `count' and declared it
  > there, and it now shares reg 1 with pseudo 96 which appears to be a
  > pointer to the `cpp_options' structure.
  >
  > 
  > That, plus inventing a variable to use instead of `c' inside the
  > switch, and adjusting some branches to go straight to their final
  > destinations, has fixed up the allocation pretty well.
Yup.  Manual LRS.  Probably more sophisticated than the stuff Cygnus did, but
it does show how effective such opts can be on this kind of target.


  > Reg 76 used to be reg 75.  Reg 44 now gets a hard register, and we're
  > using all the registers again.  The invented variable, which is reg
  > 56, does not get a hard register - that's unfortunate but not nearly
  > as bad.  I think it's because the code that uses reg 56 looks like
  > this:
  > 
  > 	case '?':
  > 	if (opts->trigraphs || opts->warn_trigraphs)
  > 	{
  > 	  char d;
  > 	  ... stuff involving d but not c ...
  > 	}
  > 	else
  > 	   *op++ = c;
  > 	break;
  > 
  > It would need to realize that c is dead if the conditional is true.
  > That's live-range-splitting again, yes?
I'd have to see all the code.  "c" may or may not be dead.


  > The failure to recognize that `count' is only live in the outer part
  > of the loop concerns me.  We do have live range *trimming*, don't we?
  > If so, why doesn't it work here?
I agree (see above).  It's not "trimming", but just basic life analysis.  

I wonder if count is used outside the for loop?  I think it's time to go
back to the flow and lreg dumps and see what information they give about
pseudo 32.


  > So yes, this code benefits considerably from live range splitting.  I
  > am not sure about the use-count issue, I'd have to stare a lot more.
  > Another place this code could win is if local register allocation
  > could handle a block like this itself:
  > 
  > {
  >   int x = read(...)
  >   if (x < 0)
  >     goto error;
  >   else if (x == 0)
  >     goto eof;
  >  
  >    /* do stuff with x here */
  > }
  > 
  > I think this is what you meant by `extended basic blocks' ?  There are
  > a bunch of cases like this.
An extended basic block is a maximal list of instructions with only one
entry point.  That kind of code would be an extended basic block since there
aren't any joiner nodes.

There are a couple of transformations done by local-alloc which are not
appropriate for EBBs, but it should (in theory) be easy to have those
transformations only apply within a BB (some of the copy opts in particular).

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27 11:49                 ` Marc Espie
@ 1999-01-31 23:58                   ` Marc Espie
  0 siblings, 0 replies; 54+ messages in thread
From: Marc Espie @ 1999-01-31 23:58 UTC (permalink / raw)
  To: zack; +Cc: egcs

In article < 199901271751.MAA20528@blastula.phys.columbia.edu > you write:

>The other issue is that before the call, we generate what looks to me
>like silly assembler:

>        subl -4104(%ebp),%ebx
>        pushl %ebx
>        movl -4104(%ebp),%ecx
>        pushl %ecx

>Which could be written

>        movl -4104(%ebp),%ecx
>        subl %ecx,%ebx
>        pushl %ebx
>        pushl %ecx

>saving four bytes of machine code, and I'd think it's faster this way
>too (but I'm no expert on Intel scheduling issues).

Thank you for looking at this issue. I don't know enough about egcs
internals nor x86 assembly yet to phrase things in such a precise
way.

Yep, this is exactly the kind of thing I've been seeing since 1.1.1,
even more obvious in some cases, such as mucking with a value directly
in memory, before bringing it in a register... without ANY branch or
code in sight that would justify such behavior.

Even though I don't know much x86 assembler, this does indeed look weird
to me.

I also tried profiling this kind of behavior.  Both flavors (in-memory
vs. register) do exhibit the same run-time... that's probably cache
at work. 

>If I use -Os, it generates almost the assembly I suggested, but
>there's a pile of other changes which invalidate the analysis: for
>example, -4104(%ebp) is now in %edi apparently throughout the function.

There's something I don't quite understand in the way egcs works.
It seems that all this register allocation behavior occurs only at the
global level.  What is weird is that sometimes, the resulting code can
obviously be improved, to the point that a human being can easily perform 
the incremental change needed. I would have thought there would have been
a peep-hole pass that would take care of this kind of back-end issues...

On another note,  I finally managed to build a complete openbsd system
with egcs990124.  Code quality has vastly improved in other areas since
1.1.1. The size of -Os optimized code is much closer than what gcc 2.8.1 -O2
used to give.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 12:00     ` Jeffrey A Law
  1999-01-25 13:14       ` John S. Dyson
  1999-01-25 13:31       ` Zack Weinberg
@ 1999-01-31 23:58       ` Jeffrey A Law
  2 siblings, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901251946.OAA06903@blastula.phys.columbia.edu >you write:
  > On Mon, 25 Jan 1999 12:55:37 +0000 (GMT), Joern Rennecke wrote:
  > >> then c gets to stay in a register all the time, and the code runs
  > >> about twice as fast.  It seems to me we should be generating identical
  > >> code for both cases.
  > >
  > >Reload inheritance follows only the fall-through path of a branch.
  > >This might be the reason for the differences you are seeing.
  > 
  > Mm, possibly.  Is there a reason for that?
Yes, it allows you to inherit more often.


  > The generated code is a little silly even when adjusted:
  > 
  > 	movb	(%esi),%dl
  > 	incl	%esi
  > 	xorl	%eax, %eax
  > 	movb	%dl, %eax
  > 	[pick case branch based on contents of eax...]
  > 
  > We could just as well do it
  > 
  > 	xorl	%eax, %eax
  > 	movb	(%esi), %al
  > 	incl	%esi
  > 	[switch...]
  > 
  > and free up a register, which is no small thing on x86.
The second sequence may actually be slower though.  The use of %eax and %al
may trigger an interlock.  Someone would need to check.

If we spilled, then we're probably going to lose regardless.  That's why
I'd prefer to see us work on avoid the spills to start with by tightening
up the x86 machine description.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 20:33             ` Jeffrey A Law
  1999-01-25 20:41               ` Zack Weinberg
@ 1999-01-31 23:58               ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901260356.WAA09714@blastula.phys.columbia.edu >you write:
  > On Mon, 25 Jan 1999 14:32:40 -0700, Jeffrey A Law wrote:
  > >  > 	decl -4112(%ebp)
  > >  > 	movl -4112(%ebp),%ecx
  > >  > 
  > >  > - if this were written the other way around, you'd avoid an address
  > >  > generation, and a read-mod-write to memory; note that this particular
  > >  > location is unlikely to be in L1.  Also the code would be smaller.
  > >  > This looks like the problem Marc Espie keeps complaining about.
  > >First, find out where and why we're getting this code.  Are we getting
  > >reloads?  Look at the dump outputs, they're going to tell you more than
  > >I ever could from looking at the assembly.
  > 
  > This one turns out to be alias analysis's fault.
  > 
  > C:
  > 	/* begin basic block */
  > 	*--ptr = '\n';
  > 	break;
  > 	/* end basic block */
  > 
  > where ptr is (ought to be) known to be pointing into a char array on
  > the stack.
Hmmm, it looks more like a spill than an alias problem.  ie, (reg 30) is not
homed in a hard register, but is instead is homed in a stack slot.

This is accomplished by changing (reg 30) to (mem (plus (fp) (offset))
in every occurrence (BTW the expression (reg 30) is shared).

The increment is still a valid insn after that transformation, so it's left
alone.

We have to load the value out of the stack slot for insn 211 because we do
not have indirect addressing (ie, (mem (mem ...)) is not a valid operand).
That's where insn 689 comes from.

Before looking for ways to optimize the reloaded code, we should look into
the register assignments and reloading.

ie, did (reg 30) initially get a hard register, but then get spilled because
the hard register was needed elsewhere?  Or did (reg 30) simply not get a hard
register because none was available.

So, first find out if (reg 30) is local to a single basic block.  You can find
that info in the .lreg file.  It'll say something like:

Register 99 used 2 times across 4 insns in block 0; set 1 time; GENERAL_REGS or 

If it doesn't say "in block X", then the register is live across multiple
blocks and can only be allocated by global allocation.  If it does say
"in block X", then it *may* be allocated by local allocation.

If it is allocated by local allocation, then in the .lreg file there'll be
a message like:

;; Register 95 in 19.

Which means pseudo 95 was assigned to hard register 19 (obviously the #s will
change, I'm taking tidbits from a PA dump).


What you find will determine what we look for in the .greg dump.


jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* a strange infelicity of register allocation
  1999-01-23 22:42 a strange infelicity of register allocation Zack Weinberg
  1999-01-25  4:56 ` Joern Rennecke
@ 1999-01-31 23:58 ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: egcs

Consider this loop:

char *p = buffer, *op = out_buffer;
for (;;)
{
    char c = *p++;
    switch (c)
    {
	case '\0':
	   goto out;
	case '\n':
	   /* stuff... */
	   break;
	case '\r':
	    /* similar stuff... */
	    break;
	case '?':
	   /* other stuff... */
	   break;
	default:
	   *op++ = c;
    }
}
out:

Where each special case is long and complicated.  Some of them re-use
the variable c for their own purposes, but the initial store to c is
dead at the beginning of each special case.

The code generated copies c = *p into a register, then out to a stack
slot.  It then does the switch based on the contents of the register.
Down in the default case, c is copied back in from the stack slot and
then written to memory.

If I move the default case to the top of the switch, so it begins like

switch (c)
{
    default:
	*op++ = c;
	break;

then c gets to stay in a register all the time, and the code runs
about twice as fast.  It seems to me we should be generating identical
code for both cases.

Platform is x86; problem was seen with both egcs 1.1.1 and the current
snapshot.  The actual code I'm talking about is the function
read_and_prescan in the patch for newline handling in cpplib which I
just posted.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 13:36         ` Jeffrey A Law
  1999-01-25 19:56           ` Zack Weinberg
@ 1999-01-31 23:58           ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901252130.QAA07655@blastula.phys.columbia.edu >you write:
  > Current snapshot, -O2:
  > 
  > switch:
  > 	movb (%esi),%al
  > 	movb %al,-4125(%ebp)
  > 	incl %esi
  > 	xorl %eax,%eax
  > 	movb -4125(%ebp),%al
  > 	cmpl $10,%eax
  > 	je case_one
  > 	jg case_two_or_three
  > 	testl %eax,%eax
  > 	je case_four
  > 	jmp default
  > 	.p2align 4,,7
  > case_two_or_three:
  > 	cmpl $13,%eax
  > 	je case_two
  > 	cmpl $63,%eax
  > 	je case_three
  > default:
  > 	movb -4125(%ebp),%dl
  > 	movb %dl,(%ebx)
  > 	incl %ebx
  > 	jmp switch
  > 	.p2align 4,,7
  > ...
  > 
  > -4125(%ebp) will be in cache, of course, but this is still pretty
  > disgusting.
  > 
  > There is questionable code elsewhere, such as:
  > 
  > 	decl -4112(%ebp)
  > 	movl -4112(%ebp),%ecx
  > 
  > - if this were written the other way around, you'd avoid an address
  > generation, and a read-mod-write to memory; note that this particular
  > location is unlikely to be in L1.  Also the code would be smaller.
  > This looks like the problem Marc Espie keeps complaining about.
First, find out where and why we're getting this code.  Are we getting
reloads?  Look at the dump outputs, they're going to tell you more than
I ever could from looking at the assembly.

  > I think the key issue here is that it doesn't seem to know to preserve
  > values in call-saved registers over a function call.  It's somewhat
  > silly to be microoptimizing around a call to xrealloc, but this sort
  > of code is all over the place.
No, gcc knows how to do that just fine.  See caller-save.c.

It may be the case that it does not think it is profitable to do so.  If so,
you would need to find out why.

But the first and most important thing is to start looking at rtl, not
the assembly code and find out where things start to go wrong.  We can sit
here and speculate about caller-saves, regmove, register allocation, spills,
etc, but it does not do us any good.  You need to look at how we got to this
bad assembly code, step by step.  I recommend starting with either reload or
local allocation, then working either forward or backwards depending on what
you find.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 20:53                 ` Jeffrey A Law
  1999-01-25 21:18                   ` Zack Weinberg
@ 1999-01-31 23:58                   ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, egcs

  In message < 199901260441.XAA09930@blastula.phys.columbia.edu >you write:
  > Right.  The reason I thought it was alias, is that it gets a hard
  > register if I change the extern char array declaration to be const.
Interesting.  But that may work because it allows optimizations earlier in
the compiler, and for whatever reason we can find a hard reg for (reg 30).

  > Register 30 used 53 times across 167 insns; set 7 times; user var; 
  > 	crosses 3 calls; GENERAL_REGS or none; pointer.
  > 
  > (It's referenced in the outer loop and in several special cases
  > of the inner loop.)
OK.  So at the top of the .greg dump you there should be a list of registers
in priority order with a conflict matrix.  That should give you some idea
of how it fits in with the rest of the registers that need to be allocated
by global alloc.  Probably worth sending the regs to allocate list up to
and including reg 30.  Probably not worth sending the whole conflict list yet.

You also need to look in the .greg dump for messages like

Spilling for insn 77.
Spilling reg 1.
 Register 110 now in 31.
 Register 120 now in stack


Probably the first thing to look for is "Register 30 now in stack" which
would indicate that it got a register during global allocation, but the
register it got was spilled and re-allocation failed.

If there isn't any such message, then the reg 30 never got a hard register
because it's priority was too low.

Looking at the calls & uses info, if a caller-saved register is available, the
compiler should use it (4 * #calls < #refs).


jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-27  9:26                           ` Zack Weinberg
  1999-01-28  5:26                             ` Jeffrey A Law
@ 1999-01-31 23:58                             ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Tue, 26 Jan 1999 05:55:21 -0700, Jeffrey A Law wrote:
>  > >The next step is to find out what registers are used by the
>  > >entries on the conflict list.  In the .greg file there should be
>  > >something like this:
>  > >
>  > >;; Register dispositions:
>  > >94 in 5  96 in 7  97 in 20  98 in 5  99 in 4  100 in 9
>  > >102 in 13  104 in 14  106 in 15  108 in 16  110 in 31  112 in 22
>  > >[ ... ]
>  > 
>  > 23 in 0  27 in 4  28 in 3  32 in 1  40 in 0  42 in 3  
>  > 43 in 5  46 in 0  49 in 0  57 in 5  64 in 0  70 in 0  
>  > 75 in 0  79 in 0  80 in 0  85 in 0  94 in 0  
>  > 
>  > which is strange: hard register 2 (ecx) isn't used for anything.  What
>  > would cause that?
>Check and see if there are any "Spilling reg 2." messages.  I bet you'll find
>one for an insn over which pseudo 30 is live.

Nope.  No spills of reg 2 anywhere.

>As you mentioned, it might be worth doing a manual live range split so that
>reg 44 can get allocated into a hard register rather than a stack slot.
>
>It's interesting to note that pseudos 27 and 32 are the only registers
>allocated to hard registers 4 and 1 respectively.  Presumably they have long
>lifetimes (which makes them conflict with everything) and high usage counts
>which gives them a high priority.
>
>It would be interesting to know what user variables those pseudos correspond
>to (or temporary expressions if they aren't a user variable).  They may be
>candidates for splitting too.  It would also be interesting to relate them
>back to the source to see if their weighted counts make any sense.

Pseudo 27 was the input buffer pointer, which deserved a permanent
register.  Pseudo 32 was `count', a variable used heavily in the outer
loop but not anywhere in the inner loop - code like this:

for (;;)
{
  count = read (...);
  if (count < 0) goto error;
  else if (count == 0) break;
  offset += count;
  ip = ibase;
  ip[count] = '\0';
  if (offset > limit) { /* grow output buffer */ }

  while ((c = *ip++) != '\0')
    switch (c)
    {
      /* count not used anywhere in here */
    }
}

I put a block around the real useful range of `count' and declared it
there, and it now shares reg 1 with pseudo 96 which appears to be a
pointer to the `cpp_options' structure.

That, plus inventing a variable to use instead of `c' inside the
switch, and adjusting some branches to go straight to their final
destinations, has fixed up the allocation pretty well.

;; Register dispositions:
23 in 0  27 in 4  28 in 3  29 in 5  35 in 1  40 in 0  
42 in 3  43 in 5  44 in 2  46 in 0  49 in 0  58 in 0  
65 in 0  76 in 0  80 in 0  81 in 0  86 in 0  96 in 1  
;; Hard regs used:  0 1 2 3 4 5 6 7 16

Reg 76 used to be reg 75.  Reg 44 now gets a hard register, and we're
using all the registers again.  The invented variable, which is reg
56, does not get a hard register - that's unfortunate but not nearly
as bad.  I think it's because the code that uses reg 56 looks like
this:

	case '?':
	if (opts->trigraphs || opts->warn_trigraphs)
	{
	  char d;
	  ... stuff involving d but not c ...
	}
	else
	   *op++ = c;
	break;

It would need to realize that c is dead if the conditional is true.
That's live-range-splitting again, yes?

The failure to recognize that `count' is only live in the outer part
of the loop concerns me.  We do have live range *trimming*, don't we?
If so, why doesn't it work here?

So yes, this code benefits considerably from live range splitting.  I
am not sure about the use-count issue, I'd have to stare a lot more.
Another place this code could win is if local register allocation
could handle a block like this itself:

{
  int x = read(...)
  if (x < 0)
    goto error;
  else if (x == 0)
    goto eof;
 
   /* do stuff with x here */
}

I think this is what you meant by `extended basic blocks' ?  There are
a bunch of cases like this.

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: a strange infelicity of register allocation
  1999-01-25 19:56           ` Zack Weinberg
  1999-01-25 20:33             ` Jeffrey A Law
@ 1999-01-31 23:58             ` Zack Weinberg
  1 sibling, 0 replies; 54+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: Joern Rennecke, egcs

On Mon, 25 Jan 1999 14:32:40 -0700, Jeffrey A Law wrote:
>  > 	decl -4112(%ebp)
>  > 	movl -4112(%ebp),%ecx
>  > 
>  > - if this were written the other way around, you'd avoid an address
>  > generation, and a read-mod-write to memory; note that this particular
>  > location is unlikely to be in L1.  Also the code would be smaller.
>  > This looks like the problem Marc Espie keeps complaining about.
>First, find out where and why we're getting this code.  Are we getting
>reloads?  Look at the dump outputs, they're going to tell you more than
>I ever could from looking at the assembly.

This one turns out to be alias analysis's fault.

C:
	/* begin basic block */
	*--ptr = '\n';
	break;
	/* end basic block */

where ptr is (ought to be) known to be pointing into a char array on
the stack.

Just before reload:

;; Start of basic block 17, registers live: 6 [bp] 7 [sp] 22 24 25 26 27 28 29 30 31 33 34 92 93
(insn 209 207 211 (set (reg/v:SI 30)
        (plus:SI (reg/v:SI 30)
            (const_int -1))) 148 {addsi3+1} (nil)
    (nil))

(insn 211 209 213 (set (mem:QI (reg/v:SI 30) 0)
        (const_int 13)) 64 {movqi+1} (insn_list 209 (nil))
    (nil))

(jump_insn 213 211 197 (set (pc)
        (label_ref 153)) 288 {jump} (nil)
    (nil))
;; End of basic block 17


Perfectly rational.

After reload:

;; Start of basic block 17, registers live: 3 [bx] 4 [si] 6 [bp] 7 [sp]
(insn 209 207 689 (set (mem:SI (plus:SI (reg:SI 6 %ebp)
                (const_int -4112)) 0)
        (plus:SI (mem:SI (plus:SI (reg:SI 6 %ebp)
                    (const_int -4112)) 0)
            (const_int -1))) 148 {addsi3+1} (nil)
    (nil))

(insn 689 209 211 (set (reg:SI 2 %ecx)
        (mem:SI (plus:SI (reg:SI 6 %ebp)
                (const_int -4112)) 0)) 54 {movsi+2} (nil)
    (nil))

(insn 211 689 213 (set (mem:QI (reg:SI 2 %ecx) 0)
        (const_int 13)) 64 {movqi+1} (insn_list 689 (insn_list 209 (nil)))
    (expr_list:REG_DEAD (reg:SI 2 %ecx)
        (nil)))

(jump_insn 213 211 197 (set (pc)
        (label_ref 153)) 288 {jump} (nil)
    (nil))
;; End of basic block 17

So it decided to do the pointer decrement in memory, then fetch it
(new insn 689) and write.  Why?  Well, there's a reference to an
extern char array in this function - nowhere near bb 17, but never
mind.  It's only ever used as an rvalue.  If I label it `const', we
get after reload:

;; Start of basic block 17, registers live: 3 [bx] 4 [si] 5 [di] 6 [bp] 7 [sp]
(insn 209 207 211 (set (reg/v:SI 5 %edi)
        (plus:SI (reg/v:SI 5 %edi)
            (const_int -1))) 148 {addsi3+1} (nil)
    (nil))

(insn 211 209 213 (set (mem:QI (reg/v:SI 5 %edi) 0)
        (const_int 13)) 64 {movqi+1} (insn_list 209 (insn_list 209 (nil)))
    (nil))

(jump_insn 213 211 197 (set (pc)
        (label_ref 153)) 288 {jump} (nil)
    (nil))
;; End of basic block 17

from which I conclude that it thinks that the extern char array can
alias something, and it has to keep the pointer consistent in memory
just in case.

This is surprising to me because all the char pointers in this
function should have been pegged as either on the stack or in the
malloc arena by alias analysis.  The extern char array is in data
space, which can't collide with either, so we ought to be able to
assume it is unmodified even if it isn't marked const.  (Yes, I used
-fstrict-aliasing.)

zw

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~1999-01-31 23:58 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-23 22:42 a strange infelicity of register allocation Zack Weinberg
1999-01-25  4:56 ` Joern Rennecke
1999-01-25 11:47   ` Zack Weinberg
1999-01-25 11:50     ` Joern Rennecke
1999-01-31 23:58       ` Joern Rennecke
1999-01-25 12:00     ` Jeffrey A Law
1999-01-25 13:14       ` John S. Dyson
1999-01-31 23:58         ` John S. Dyson
1999-01-25 13:31       ` Zack Weinberg
1999-01-25 13:36         ` Jeffrey A Law
1999-01-25 19:56           ` Zack Weinberg
1999-01-25 20:33             ` Jeffrey A Law
1999-01-25 20:41               ` Zack Weinberg
1999-01-25 20:53                 ` Jeffrey A Law
1999-01-25 21:18                   ` Zack Weinberg
1999-01-25 21:30                     ` Jeffrey A Law
1999-01-25 21:38                       ` Zack Weinberg
1999-01-26  4:59                         ` Jeffrey A Law
1999-01-27  9:26                           ` Zack Weinberg
1999-01-28  5:26                             ` Jeffrey A Law
1999-01-28 17:20                               ` Zack Weinberg
1999-01-28 17:33                                 ` Joern Rennecke
1999-01-28 18:01                                 ` Jeffrey A Law
1999-01-28 18:27                                   ` Zack Weinberg
1999-01-28 19:58                                     ` Jeffrey A Law
1999-01-31 23:58                               ` Jeffrey A Law
1999-01-31 23:58                             ` Zack Weinberg
1999-01-31 23:58                           ` Jeffrey A Law
1999-01-31 23:58                         ` Zack Weinberg
1999-01-31 23:58                       ` Jeffrey A Law
1999-01-31 23:58                     ` Zack Weinberg
1999-01-31 23:58                   ` Jeffrey A Law
1999-01-31 23:58                 ` Zack Weinberg
1999-01-31 23:58               ` Jeffrey A Law
1999-01-31 23:58             ` Zack Weinberg
1999-01-31 23:58           ` Jeffrey A Law
1999-01-26  7:44         ` Joern Rennecke
1999-01-27  8:35           ` Zack Weinberg
1999-01-27  9:08             ` Joern Rennecke
1999-01-27  9:52               ` Zack Weinberg
1999-01-27 11:49                 ` Marc Espie
1999-01-31 23:58                   ` Marc Espie
1999-01-31 23:58                 ` Zack Weinberg
1999-01-31 23:58               ` Joern Rennecke
1999-01-28  8:11             ` Jeffrey A Law
1999-01-28 11:40               ` Zack Weinberg
1999-01-31 23:58               ` Jeffrey A Law
1999-01-31 23:58             ` Zack Weinberg
1999-01-31 23:58           ` Joern Rennecke
1999-01-31 23:58         ` Zack Weinberg
1999-01-31 23:58       ` Jeffrey A Law
1999-01-31 23:58     ` Zack Weinberg
1999-01-31 23:58   ` Joern Rennecke
1999-01-31 23:58 ` Zack Weinberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).