Re-run of loop pass

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re-run of loop pass
@ 1998-10-17 19:32 Michael Hayes
  1998-10-18  0:15 ` Jeffrey A Law
  1998-10-18  6:11 ` Toon Moene
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-17 19:32 UTC (permalink / raw)
  To: egcs; +Cc: Michael Hayes

What's the philosophy of rerunning the loop optimisation pass twice?

I've noticed that this can generate poorer code when unrolling loops
or using the bct optimisation if a BIV is eliminated on the first
pass.

On the second pass, loop_iterations is often unable to determine the
iteration count for loops with limits that should be able to be
determined at compile time.  Loop unrolling or the bct optimisation
then makes a dog's breakfast of things.

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-17 19:32 Re-run of loop pass Michael Hayes
@ 1998-10-18  0:15 ` Jeffrey A Law
  1998-10-18  3:11   ` Michael Hayes
  1998-10-21 23:18   ` Richard Henderson
  1998-10-18  6:11 ` Toon Moene
  1 sibling, 2 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-10-18  0:15 UTC (permalink / raw)
  To: Michael Hayes; +Cc: egcs

  In message < 13865.9178.455954.70561@ongaonga.elec.canterbury.ac.nz >you write:
  > I've noticed that this can generate poorer code when unrolling loops
  > or using the bct optimisation if a BIV is eliminated on the first
  > pass.
  > 
  > On the second pass, loop_iterations is often unable to determine the
  > iteration count for loops with limits that should be able to be
  > determined at compile time.  Loop unrolling or the bct optimisation
  > then makes a dog's breakfast of things.
We've generally gotten better code.

You should investigate why the second loop pass is unable to determine
loop iteration counts.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-18  0:15 ` Jeffrey A Law
@ 1998-10-18  3:11   ` Michael Hayes
  1998-10-18 11:04     ` David Edelsohn
  1998-10-18 12:43     ` Mark Mitchell
  1998-10-21 23:18   ` Richard Henderson
  1 sibling, 2 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-18  3:11 UTC (permalink / raw)
  To: law; +Cc: Michael Hayes, egcs

Jeffrey A Law writes:
 > You should investigate why the second loop pass is unable to determine
 > loop iteration counts.

Consider a simple dot product function like the following:

float foo(float *a, float *b)
{
  int i;
  float result = 0.0;

  for (i = 0; i < 4; i++)
    result += a[i] * b[i];

  return result;
}

The first pass often eliminates the BIV associated with i and replaces
the end of loop test i < 4 with a comparison of the incremented
pointer associated with a + i with a + 4 (I think the elimination is
only performed for targets with autoincrement addressing modes).  The
second pass then finds that the initial value of BIV associated with a
is not a constant and thus cannot determine the iteration count.

I imagine that the information regarding iteration counts should be
gathered on the first pass and then used on the second pass if needed.

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-17 19:32 Re-run of loop pass Michael Hayes
  1998-10-18  0:15 ` Jeffrey A Law
@ 1998-10-18  6:11 ` Toon Moene
  1998-10-18 17:19   ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Toon Moene @ 1998-10-18  6:11 UTC (permalink / raw)
  To: Michael Hayes; +Cc: egcs

[ Because I invented this hack, it's appropriate I write an
  apology on it - however, don't expect a Platonesk Dialogue ]

>  What's the philosophy of rerunning the loop optimisation
>  pass twice?

It's a poor man's device to cover up that the various *_giv  
routines in loop.c can't handle general induction variables of the  
form CONSTANT + INVARIANT + BIV * INVARIANT.  To quote my own  
message to the egcs mailing list (before it went public - you won't  
find this in the archives):

<QUOTE>
2. GIVs of the form (INVARIANT * BIV + INVARIANT + CONSTANT) are  
not recognised by strength reduction.

These givs are generated in the address expressions resulting from  
accessing arrays that are not 0-based.  Because Fortran arrays that  
are not explicitly declared as being 0-based are 1-based, strength  
reduction is not very effective for loops in Fortran (in fact, part  
of the above expression is reduced).

Short term solution - run loop optimisation twice to get the  
expression completely reduced.

Long term solution:  Make the various *_giv_* routines in loop.c  
recognise the above expression as a giv - this might not even be  
hard for the person who _wrote_ those routines in the first place.
</QUOTE>

>  I've noticed that this can generate poorer code when
>  unrolling loops or using the bct optimisation if a BIV
>  is eliminated on the first pass.

I do not completely understand this remark.  The sequence is:

first-pass-through-loop; second-pass-through-loop;  
pass-through-unroll; bct-optimisations.

How can the elimination of a BIV be so important then ?  It can  
either happen in the first or the second pass through loop, but that  
shouldn't matter.

>  On the second pass, loop_iterations is often unable to
>  determine the iteration count for loops with limits that
>  should be able to be determined at compile time.  Loop
>  unrolling or the bct optimisation then makes a dog's
>  breakfast of things.

Do you have an example of this at hand ?  I fail to see how this  
could be happening.

Thanks,
Toon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-18  3:11   ` Michael Hayes
@ 1998-10-18 11:04     ` David Edelsohn
  1998-10-18 12:43     ` Mark Mitchell
  1 sibling, 0 replies; 23+ messages in thread
From: David Edelsohn @ 1998-10-18 11:04 UTC (permalink / raw)
  To: Michael Hayes; +Cc: law, egcs

>>>>> Michael Hayes writes:

Michael> The first pass often eliminates the BIV associated with i and replaces
Michael> the end of loop test i < 4 with a comparison of the incremented
Michael> pointer associated with a + i with a + 4 (I think the elimination is
Michael> only performed for targets with autoincrement addressing modes).  The
Michael> second pass then finds that the initial value of BIV associated with a
Michael> is not a constant and thus cannot determine the iteration count.

	I think that this is addressed by the following comment in
unroll.c:loop_iterations()

  /* ?? Final value and initial value do not have to be constants.
     Only their difference has to be constant.  When the iteration
variable
     is an array address, the final value and initial value might both
     be addresses with the same base but different constant offsets.
     Final value must be invariant for this to work.

     To do this, need some way to find the values of registers which are
     invariant.  */

I guess this is not yet implemented.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-18  3:11   ` Michael Hayes
  1998-10-18 11:04     ` David Edelsohn
@ 1998-10-18 12:43     ` Mark Mitchell
  1998-10-18 18:52       ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Mark Mitchell @ 1998-10-18 12:43 UTC (permalink / raw)
  To: m.hayes; +Cc: law, m.hayes, egcs

>>>>> "Michael" == Michael Hayes <m.hayes@elec.canterbury.ac.nz> writes:

    Michael> The first pass often eliminates the BIV associated with i
    Michael> and replaces the end of loop test i < 4 with a comparison
    Michael> of the incremented pointer associated with a + i with a +
    Michael> 4 (I think the elimination is only performed for targets
    Michael> with autoincrement addressing modes).  The second pass
    Michael> then finds that the initial value of BIV associated with
    Michael> a is not a constant and thus cannot determine the
    Michael> iteration count.

Or the second pass could learn to recongize some of these idioms.
For example, you seem to imply that if I write:

  int* i;
  int* j;
  for (i = j ; i < j + 4; ++j)
    ;

say, that loop can't figure out the number of iterations.  A little
algebra might be helpful here, since this is a reasonably common
idiom, especially in C++ using STL, and especially where `4' is
replaced by `k'.  So much so that if no-one else does this, I'll
probably get around to it.

-- 
Mark Mitchell 			mark@markmitchell.com
Mark Mitchell Consulting	http://www.markmitchell.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-18  6:11 ` Toon Moene
@ 1998-10-18 17:19   ` Michael Hayes
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-18 17:19 UTC (permalink / raw)
  To: Toon Moene; +Cc: Michael Hayes, egcs

Toon Moene writes:
 > I do not completely understand this remark.  The sequence is:
 > 
 > first-pass-through-loop; second-pass-through-loop;  
 > pass-through-unroll; bct-optimisations.
 > 
 > How can the elimination of a BIV be so important then ?  It can  
 > either happen in the first or the second pass through loop, but that  
 > shouldn't matter.

 > Do you have an example of this at hand ?  I fail to see how this  
 > could be happening.

OK, here's a simple example

void fcopy1(float *a, float *b)
{
    int i;
    
    for (i = 0; i < 4; i++)
	a[i] = b[i];
}

The loop on the first pass gets converted to

 for (p = a, q = b, r = a + 4; p < r; )
    *p++ = *q++;

The second pass then gives up trying to calculate the number of loop
iterations since the initial value of p is not a constant.  Now some
algebra could be applied to determine the number of loop iterations
but this would involve tracking register values. 

While it would be desirable to directly determine the number of loop
iterations of such a modified loop, the easiest solution might be to
cache the iteration counts computed on the first pass.

As I mentioned previously, the problem appears with machine having
autoincrement addressing modes.  For example, compiling for the m68k
generates (gcc -O2 -funroll-loops):


	.file	"fcopy1.c"
	.version	"01.01"
gcc2_compiled.:
.text
	.align 	2
.globl fcopy1
	.type	 fcopy1,@function
fcopy1:
	link.w %a6,#0
	movm.l #0x3030,-(%sp)
	move.l 8(%a6),%a2
	move.l 12(%a6),%a3
	moveq.l #12,%d2
	add.l %a2,%d2
	moveq.l #13,%d0
	add.l %a2,%d0
	moveq.l #13,%d1
	cmp.l %a2,%d0
	jble .L9
	tst.l %d1
	jbeq .L5
	moveq.l #4,%d3
	cmp.l %d1,%d3
	jbge .L9
	moveq.l #8,%d3
	cmp.l %d1,%d3
	jbge .L10
	cmp.l %d1,%d1
	jble .L5
	move.l (%a3)+,(%a2)+
.L10:
	move.l (%a3)+,(%a2)+
.L9:
	move.l (%a3)+,(%a2)+
	cmp.l %a2,%d2
	jblt .L7
	.align 	2
.L5:
	move.l %a2,%a1
	move.l %a3,%a0
	move.l (%a0)+,(%a1)+
	move.l (%a0),(%a1)
	move.l 8(%a3),8(%a2)
	move.l 12(%a3),12(%a2)
	lea (16,%a3),%a3
	lea (16,%a2),%a2
	cmp.l %a2,%d2
	jbge .L5
.L7:
	movm.l (%sp)+,#0xc0c
	unlk %a6
	rts
.Lfe1:
	.size	 fcopy1,.Lfe1-fcopy1
	.ident	"GCC: (GNU) egcs-2.92.15 19981018 (gcc2 ss-980609 experimental)"

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-18 12:43     ` Mark Mitchell
@ 1998-10-18 18:52       ` Michael Hayes
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-18 18:52 UTC (permalink / raw)
  To: mark; +Cc: m.hayes, toon, law, egcs

Mark Mitchell writes:
 > Or the second pass could learn to recongize some of these idioms.
 > For example, you seem to imply that if I write:
 > 
 >   int* i;
 >   int* j;
 >   for (i = j ; i < j + 4; ++j)
 >     ;
 > 
 > say, that loop can't figure out the number of iterations.  A little
 > algebra might be helpful here, since this is a reasonably common
 > idiom, especially in C++ using STL, and especially where `4' is
 > replaced by `k'.  So much so that if no-one else does this, I'll
 > probably get around to it.

OK, this patch (with maybe a little strengthening) should handle the 
very simple case.

Index: unroll.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/unroll.c,v
retrieving revision 1.32
diff -c -3 -p -r1.32 unroll.c
*** unroll.c	1998/10/14 09:02:52	1.32
--- unroll.c	1998/10/18 22:02:59
*************** loop_iterations (loop_start, loop_end)
*** 3499,3504 ****
--- 3499,3508 ----
  		  if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST
  		      && CONSTANT_P (XEXP (note, 0)))
  		    comparison_value = XEXP (note, 0);
+ 		  /* May have to check that we have the sum of a register
+ 		     and a constant.  */
+ 		  else if (GET_CODE (SET_SRC (set)) == PLUS)
+ 		    comparison_value = SET_SRC (set);
  		}
  	      break;
  	    }
*************** loop_iterations (loop_start, loop_end)
*** 3535,3544 ****
      }
    else if (GET_CODE (initial_value) != CONST_INT)
      {
!       if (loop_dump_stream)
! 	fprintf (loop_dump_stream,
! 		 "Loop unrolling: Initial value not constant.\n");
!       return 0;
      }
    else if (final_value == 0)
      {
--- 3539,3559 ----
      }
    else if (GET_CODE (initial_value) != CONST_INT)
      {
!       if (REG_P (initial_value)
! 	  && GET_CODE (final_value) == PLUS
! 	  && XEXP (final_value, 0) == initial_value
! 	  && CONSTANT_P (XEXP (final_value, 1)))
! 	{
! 	  initial_value = const0_rtx;
! 	  final_value = XEXP (final_value, 1);
! 	}
!       else
! 	{
! 	  if (loop_dump_stream)
! 	    fprintf (loop_dump_stream,
! 		     "Loop unrolling: Initial value not constant.\n");
! 	  return 0;
! 	}
      }
    else if (final_value == 0)
      {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-18  0:15 ` Jeffrey A Law
  1998-10-18  3:11   ` Michael Hayes
@ 1998-10-21 23:18   ` Richard Henderson
  1998-10-22  9:28     ` Jeffrey A Law
  1998-10-22 21:15     ` Toon Moene
  1 sibling, 2 replies; 23+ messages in thread
From: Richard Henderson @ 1998-10-21 23:18 UTC (permalink / raw)
  To: law, Michael Hayes; +Cc: egcs

On Sun, Oct 18, 1998 at 01:14:51AM -0600, Jeffrey A Law wrote:
> We've generally gotten better code.

Not lately, as far as I can tell.

It was originally put in because it helped Fortran identify more
GIVs -- probably just due to damebramage wrt move_movables.  The
bits I've looked at since this spring's loop enhancements did not
show an improvement from re-running loop.

Toon should probably comment on this.

r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-21 23:18   ` Richard Henderson
@ 1998-10-22  9:28     ` Jeffrey A Law
  1998-10-22 21:15     ` Toon Moene
  1 sibling, 0 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-10-22  9:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, egcs

  In message < 19981021224801.A1763@dot.cygnus.com >you write:
  > On Sun, Oct 18, 1998 at 01:14:51AM -0600, Jeffrey A Law wrote:
  > > We've generally gotten better code.
  > 
  > Not lately, as far as I can tell.
I benchmarked it after your big loop change and it was still improving code
on the PA.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-21 23:18   ` Richard Henderson
  1998-10-22  9:28     ` Jeffrey A Law
@ 1998-10-22 21:15     ` Toon Moene
  1998-10-23  2:57       ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Toon Moene @ 1998-10-22 21:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, Michael Hayes, egcs

[ Rerunning loop optimisation pass ... ]

On Sun, Oct 18, 1998 at 01:14:51AM -0600, Jeffrey A Law wrote:
> We've generally gotten better code.

>  Not lately, as far as I can tell.
>
>  It was originally put in because it helped Fortran
>  identify more GIVs -- probably just due to damebramage
>  wrt move_movables.  The bits I've looked at since this
>  spring's loop enhancements did not show an improvement
>  from re-running loop.
>
>  Toon should probably comment on this.

Well, I'd love to, but since over a week the CVS'd snapshot builds  
die over here when compiling _bb during stage 2. (Note that that is  
using the stage 2 compiler, so probably this one is miscompiled by  
the stage 1 compiler).

The interesting thing about loop unrolling in this context is, that  
we deliberately moved it to *after* the second loop opt phase  
because it suffered badly from GIVs resulting from loop unrolling  
being found and reduced by the second loop opt pass.  Now that GIV  
combining is in full force, this might not be necessary anymore:

      DO I = 1, N, 4
         X(I  ) = X(I  ) + Y(I  )
         X(I+1) = X(I+1) + Y(I+1)
         X(I+2) = X(I+2) + Y(I+2)
         X(I+3) = X(I+3) + Y(I+3)
      ENDDO

would previously lead to 8 registers being used for addressing,  
against 2 now.

Cheers,
Toon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-23  2:57           ` Jeffrey A Law
@ 1998-10-23  2:57             ` Michael Hayes
  1998-10-25 13:53               ` Problem compiling python Hasan Diwan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Hayes @ 1998-10-23  2:57 UTC (permalink / raw)
  To: law; +Cc: Michael Hayes, Toon Moene, Richard Henderson, egcs

Jeffrey A Law writes:
 > Does this latest version subsume your earlier unroll.c changes?

Yep, this is the all-singing, all-dancing new one.  It should patch
against current unroll.c

 > [ Still catching up after being away for a few days... ]

Things are little quiet when you go away!

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-23  2:57       ` Michael Hayes
@ 1998-10-23  2:57         ` Michael Hayes
  1998-10-23  2:57           ` Jeffrey A Law
  1998-11-04 20:47           ` Re-run of loop pass Joern Rennecke
  1998-10-23 17:45         ` Toon Moene
  1 sibling, 2 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-23  2:57 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Toon Moene, Richard Henderson, law, egcs

Michael Hayes writes:
 > I've done some experimentation with this code snippet.  With
 > -funroll-loops or BCT optimisation the generated code is _attrocious_
 > since the constant iteration count cannot be determined.  The patch at
 > the end seems to fix this problem...

I've come up with a better patch that tries to determine what the
expressions for the loop initial and final values are if they are not
constant.  If the initial and final expressions are the sum of an
equivalent base register and some constants, then the initial and
final values are replaced with the constants.  The number of loop
iterations is now simply calculable and thus the loop unroll and loop
BCT optimisations are happy.

Michael.

Index: unroll.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/unroll.c,v
retrieving revision 1.32
diff -c -3 -p -r1.32 unroll.c
*** unroll.c	1998/10/14 09:02:52	1.32
--- unroll.c	1998/10/23 01:42:32
*************** final_giv_value (v, loop_start, loop_end
*** 3403,3408 ****
--- 3403,3445 ----
  }
  
  
+ static rtx
+ loop_equiv_value (loop_start, val)
+      rtx loop_start;
+      rtx val;
+ {
+   rtx insn, set;
+   
+   for (insn = PREV_INSN (loop_start); insn ; insn = PREV_INSN (insn))
+     {
+       if (GET_CODE (insn) == CODE_LABEL)
+ 	break;
+       
+       else if (GET_RTX_CLASS (GET_CODE (insn)) == 'i'
+ 	       && reg_set_p (val, insn))
+ 	{
+ 	  /* We found the last insn before the loop that sets the register.
+ 	     If it sets the entire register, and has a REG_EQUAL note,
+ 	     then use the value of the REG_EQUAL note.  */
+ 	  if ((set = single_set (insn))
+ 		  && (SET_DEST (set) == val))
+ 	    {
+ 	      rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 	      
+ 	      /* Only use the REG_EQUAL note if it is a constant.
+ 		 Other things, divide in particular, will cause
+ 		 problems later if we use them.  */
+ 	      if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST
+ 		  && CONSTANT_P (XEXP (note, 0)))
+ 		val = XEXP (note, 0);
+ 	      val = SET_SRC (set);
+ 	    }
+ 	  break;
+ 	}
+     }
+   return val;
+ }
+ 
  /* Calculate the number of loop iterations.  Returns the exact number of loop
     iterations if it can be calculated, otherwise returns zero.  */
  
*************** loop_iterations (loop_start, loop_end)
*** 3474,3513 ****
       its value from the insns before the start of the loop.  */
  
    if (GET_CODE (comparison_value) == REG && invariant_p (comparison_value))
      {
!       rtx insn, set;
!     
!       for (insn = PREV_INSN (loop_start); insn ; insn = PREV_INSN (insn))
! 	{
! 	  if (GET_CODE (insn) == CODE_LABEL)
! 	    break;
  
! 	  else if (GET_RTX_CLASS (GET_CODE (insn)) == 'i'
! 		   && reg_set_p (comparison_value, insn))
! 	    {
! 	      /* We found the last insn before the loop that sets the register.
! 		 If it sets the entire register, and has a REG_EQUAL note,
! 		 then use the value of the REG_EQUAL note.  */
! 	      if ((set = single_set (insn))
! 		  && (SET_DEST (set) == comparison_value))
! 		{
! 		  rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
! 
! 		  /* Only use the REG_EQUAL note if it is a constant.
! 		     Other things, divide in particular, will cause
! 		     problems later if we use them.  */
! 		  if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST
! 		      && CONSTANT_P (XEXP (note, 0)))
! 		    comparison_value = XEXP (note, 0);
! 		}
! 	      break;
! 	    }
  	}
      }
  
!   final_value = approx_final_value (comparison_code, comparison_value,
! 				    &unsigned_compare, &compare_dir);
! 
    /* Save the calculated values describing this loop's bounds, in case
       precondition_loop_p will need them later.  These values can not be
       recalculated inside precondition_loop_p because strength reduction
--- 3511,3562 ----
       its value from the insns before the start of the loop.  */
  
    if (GET_CODE (comparison_value) == REG && invariant_p (comparison_value))
+     comparison_value = loop_equiv_value (loop_start, comparison_value);
+ 
+   final_value = approx_final_value (comparison_code, comparison_value,
+ 				    &unsigned_compare, &compare_dir);
+ 
+   if (REG_P (initial_value))
      {
!       rtx temp = final_value;
  
!       /* INITIAL_VALUE = reg1, FINAL_VALUE = reg2 + const, where reg1 != reg2.
! 	 Try to find what reg1 is equivalent to.  Hopefully it will either
! 	 be reg2 or reg2 plus a constant.  */
!       if (GET_CODE (temp) == PLUS)
! 	temp = XEXP (temp, 0);
!       if (REG_P (temp) && REGNO (temp) != REGNO (initial_value))
! 	initial_value = loop_equiv_value (loop_start, initial_value);
!     }
! 
!   /* If have INITIAL_VALUE = reg + const1 and FINAL_VALUE = reg + const2,
!      then replace INITIAL_VALUE with const1 and FINAL_VALUE with const2.  */
!   if ((GET_CODE (initial_value) == REG || GET_CODE (initial_value) == PLUS)
!       && (GET_CODE (final_value) == REG || GET_CODE (final_value) == PLUS))
!     {
!       rtx init_val;
!       rtx fini_val;
!       rtx init_reg;
!       rtx fini_reg;
! 
!       if (GET_CODE (initial_value) == PLUS)
! 	init_val = XEXP (initial_value, 1), init_reg = XEXP (initial_value, 0);
!       else
! 	init_val = const0_rtx, init_reg = initial_value;
! 
!       if (GET_CODE (final_value) == PLUS)
! 	fini_val = XEXP (final_value, 1), fini_reg = XEXP (final_value, 0);
!       else
! 	fini_val = const0_rtx, fini_reg = final_value;
! 
!       if (init_reg == fini_reg)
! 	{
! 	  initial_value = init_val;
! 	  final_value = fini_val;
  	}
      }
  
!   
    /* Save the calculated values describing this loop's bounds, in case
       precondition_loop_p will need them later.  These values can not be
       recalculated inside precondition_loop_p because strength reduction
*************** loop_iterations (loop_start, loop_end)
*** 3554,3568 ****
  		 "Loop unrolling: Final value not constant.\n");
        return 0;
      }
- 
-   /* ?? Final value and initial value do not have to be constants.
-      Only their difference has to be constant.  When the iteration variable
-      is an array address, the final value and initial value might both
-      be addresses with the same base but different constant offsets.
-      Final value must be invariant for this to work.
- 
-      To do this, need some way to find the values of registers which are
-      invariant.  */
  
    /* Final_larger is 1 if final larger, 0 if they are equal, otherwise -1.  */
    if (unsigned_compare)
--- 3603,3608 ----

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-23  2:57         ` Michael Hayes
@ 1998-10-23  2:57           ` Jeffrey A Law
  1998-10-23  2:57             ` Michael Hayes
  1998-11-04 20:47           ` Re-run of loop pass Joern Rennecke
  1 sibling, 1 reply; 23+ messages in thread
From: Jeffrey A Law @ 1998-10-23  2:57 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Toon Moene, Richard Henderson, egcs

  In message < 13871.53637.686739.360606@ongaonga.elec.canterbury.ac.nz >you writ
e:
  > Michael Hayes writes:
  >  > I've done some experimentation with this code snippet.  With
  >  > -funroll-loops or BCT optimisation the generated code is _attrocious_
  >  > since the constant iteration count cannot be determined.  The patch at
  >  > the end seems to fix this problem...
  > 
  > I've come up with a better patch that tries to determine what the
  > expressions for the loop initial and final values are if they are not
  > constant.  If the initial and final expressions are the sum of an
  > equivalent base register and some constants, then the initial and
  > final values are replaced with the constants.  The number of loop
  > iterations is now simply calculable and thus the loop unroll and loop
  > BCT optimisations are happy.
Does this latest version subsume your earlier unroll.c changes?

[ Still catching up after being away for a few days... ]

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-22 21:15     ` Toon Moene
@ 1998-10-23  2:57       ` Michael Hayes
  1998-10-23  2:57         ` Michael Hayes
  1998-10-23 17:45         ` Toon Moene
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-23  2:57 UTC (permalink / raw)
  To: Toon Moene; +Cc: Richard Henderson, law, Michael Hayes, egcs

Toon Moene writes:
 > [ Rerunning loop optimisation pass ... ]
 > 
 > The interesting thing about loop unrolling in this context is, that  
 > we deliberately moved it to *after* the second loop opt phase  
 > because it suffered badly from GIVs resulting from loop unrolling  
 > being found and reduced by the second loop opt pass.  Now that GIV  
 > combining is in full force, this might not be necessary anymore:
 > 
 >       DO I = 1, N, 4
 >          X(I  ) = X(I  ) + Y(I  )
 >          X(I+1) = X(I+1) + Y(I+1)
 >          X(I+2) = X(I+2) + Y(I+2)
 >          X(I+3) = X(I+3) + Y(I+3)
 >       ENDDO
 > 
 > would previously lead to 8 registers being used for addressing,  
 > against 2 now.

I've done some experimentation with this code snippet.  With
-funroll-loops or BCT optimisation the generated code is _attrocious_
since the constant iteration count cannot be determined.  The patch at
the end seems to fix this problem (it could be generalised but for
most cases this may not be necessary).  Could someone test this on a
host machine with autoincrement addressing (such as the 68k)?

With my patch applied, I've observed the following address register
usage:

6 addr regs  -O2 -funroll-loops
4 addr regs  -O2 -funroll-loops -fno-rerun-loop-opt
2 addr regs  -O2 
2 addr regs  -O2 -fno-rerun-loop-opt

Michael.

Index: unroll.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/unroll.c,v
retrieving revision 1.32
diff -c -3 -p -r1.32 unroll.c
*** unroll.c	1998/10/14 09:02:52	1.32
--- unroll.c	1998/10/22 23:00:52
*************** loop_iterations (loop_start, loop_end)
*** 3499,3504 ****
--- 3499,3508 ----
  		  if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST
  		      && CONSTANT_P (XEXP (note, 0)))
  		    comparison_value = XEXP (note, 0);
+ 		  /* May have to check that we have the sum of a register
+ 		     and a constant.  */
+ 		  else if (GET_CODE (SET_SRC (set)) == PLUS)
+ 		    comparison_value = SET_SRC (set);
  		}
  	      break;
  	    }
*************** loop_iterations (loop_start, loop_end)
*** 3508,3513 ****
--- 3512,3581 ----
    final_value = approx_final_value (comparison_code, comparison_value,
  				    &unsigned_compare, &compare_dir);
  
+   /* initial_value = reg1, final_value = reg2 + const
+      Try to find what reg1 is equivalent to.  */
+   if (REG_P (initial_value)
+       && GET_CODE (final_value) == PLUS
+       && CONSTANT_P (XEXP (final_value, 1))
+       && XEXP (final_value, 0) != initial_value)
+     {
+       rtx insn, set;
+       
+       for (insn = PREV_INSN (loop_start); insn ; insn = PREV_INSN (insn))
+ 	{
+ 	  if (GET_CODE (insn) == CODE_LABEL)
+ 	    break;
+ 	      
+ 	  else if (GET_RTX_CLASS (GET_CODE (insn)) == 'i'
+ 		   && reg_set_p (initial_value, insn))
+ 	    {
+ 	      /* We found the last insn before the loop that sets the
+ 		 register.  If it sets the entire register, and has a
+ 		 REG_EQUAL note, then use, the value of the REG_EQUAL
+ 		 note.  */
+ 	      if ((set = single_set (insn))
+ 		  && (SET_DEST (set) == initial_value))
+ 		{
+ 		  rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 		  
+ 		  /* Only use the REG_EQUAL note if it is a constant.
+ 		     Other things, divide in particular, will cause
+ 		     problems later if we use them.  */
+ 		  if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST
+ 		      && CONSTANT_P (XEXP (note, 0)))
+ 		    initial_value = XEXP (note, 0);
+ 		  /* May have to check that we have the sum of a register
+ 		     and a constant.  */
+ 		  else if (GET_CODE (SET_SRC (set)) == PLUS)
+ 		    initial_value = SET_SRC (set);
+ 		}
+ 	      break;
+ 	    }
+ 	}
+     }
+   
+   /* initial_value = reg + const1, final_value = reg + const2 */
+   if (GET_CODE (initial_value) == PLUS
+       && GET_CODE (final_value) == PLUS
+       && CONSTANT_P (XEXP (initial_value, 1))
+       && CONSTANT_P (XEXP (final_value, 1))
+       && XEXP (final_value, 0) == XEXP (initial_value, 0))
+     {
+       final_value = GEN_INT (INTVAL (XEXP (final_value, 1)) 
+ 			     - INTVAL (XEXP (initial_value, 1)));
+       initial_value = const0_rtx;
+     }
+   
+   /* initial_value = reg, final_value = reg + const */
+   if (REG_P (initial_value)
+       && GET_CODE (final_value) == PLUS
+       && CONSTANT_P (XEXP (final_value, 1))
+       && XEXP (final_value, 0) == initial_value)
+     {
+       initial_value = const0_rtx;
+       final_value = XEXP (final_value, 1);
+     }
+   
    /* Save the calculated values describing this loop's bounds, in case
       precondition_loop_p will need them later.  These values can not be
       recalculated inside precondition_loop_p because strength reduction


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-23  2:57       ` Michael Hayes
  1998-10-23  2:57         ` Michael Hayes
@ 1998-10-23 17:45         ` Toon Moene
  1998-10-23 17:45           ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Toon Moene @ 1998-10-23 17:45 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Richard Henderson, law, egcs

I wrote:

T > [ Rerunning loop optimisation pass ... ]

T > The interesting thing about loop unrolling in this context is, that  
T > we deliberately moved it to *after* the second loop opt phase
T > because it suffered badly from GIVs resulting from loop unrolling  
T > being found and reduced by the second loop opt pass.  Now that GIV  
T > combining is in full force, this might not be necessary anymore:

T >       DO I = 1, N, 4
T >          X(I  ) = X(I  ) + Y(I  )
T >          X(I+1) = X(I+1) + Y(I+1)
T >          X(I+2) = X(I+2) + Y(I+2)
T >          X(I+3) = X(I+3) + Y(I+3)
T >       ENDDO

T > would previously lead to 8 registers being used for addressing,  
T > against 2 now.

Michael:

>  I've done some experimentation with this code snippet.
>  With -funroll-loops or BCT optimisation the generated
>  code is _attrocious_ since the constant iteration count
>  cannot be determined.  The patch at the end seems to fix
>  this problem (it could be generalised but for most cases
>  this may not be necessary).  Could someone test this on
>  a host machine with autoincrement addressing (such as
>  the 68k)?

I think we're talking at cross purposes - the above code doesn't  
have a constant iteration count.  The iteration count is N / 4 iff N  
% 4 == 0 (in fact, the code as shown assumes this).  The code was  
just meant to show why we're running loop unrolling *after* the  
second pass through loop opt ...

If -funroll-loops nevertheless generates worse code, than that's  
due to something I haven't foreseen.

HTH,
Toon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-23 17:45         ` Toon Moene
@ 1998-10-23 17:45           ` Michael Hayes
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1998-10-23 17:45 UTC (permalink / raw)
  To: Toon Moene; +Cc: Michael Hayes, Richard Henderson, law, egcs

Toon Moene writes:

 > I think we're talking at cross purposes - the above code doesn't  
 > have a constant iteration count.  The iteration count is N / 4 iff N  
 > % 4 == 0 (in fact, the code as shown assumes this).  The code was  
 > just meant to show why we're running loop unrolling *after* the  
 > second pass through loop opt ...

Oops, it's 15+ years since I used to program in Fortran.  I assumed N
was a constant.

 > If -funroll-loops nevertheless generates worse code, than that's  
 > due to something I haven't foreseen.

It only generates poorer code for loops with a constant iteration
count on machines with autoincrement addressing when using loop
unrolling or BCT optimisations.  The same problem is exhibited on a
single loop pass for examples such as

void fcopy1(float *a, float *b)
{
  float *c;
    
  for (c = a; c < a + 4; )
    *c++ = *b++;
}

Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Problem compiling python
  1998-10-23  2:57             ` Michael Hayes
@ 1998-10-25 13:53               ` Hasan Diwan
  1998-10-30  2:40                 ` Alexandre Oliva
  0 siblings, 1 reply; 23+ messages in thread
From: Hasan Diwan @ 1998-10-25 13:53 UTC (permalink / raw)
  To: egcs

Ladies and Gentlemen:
	I am having a problem compiling python versions 1.4, 1.5, and 1.5.1. I think it is due to egcs. The error is:
/usr/include/__math.h: in function 'floatsleep':
/usr/include/__math.h:277:'asm' needs too many reloads
I will have a look at the offending header file when I get the time to do so. I am using glibc 2.0.6, if that is of concern.
-- 
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i

iQCVAwUBNjOW3Q6sg/oMyISxAQH+RQP8D+XAt8+Fy7Yol8XJoi/dwd4IGuKEI9ur
A853ynSWcnj/2zbtmOZZ34u8Fn/CKp0kLoReDWyF4wvS2j5queD1+7/JIdxkKVXn
IYwZXEPtuCIyFwxKq/aZsOPDphFZgRdMY36wZfM1pBvmJBI0TUHdzBv6Cbhwqmnj
T0W8R77Hijc=
=jwz3
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Problem compiling python
  1998-10-25 13:53               ` Problem compiling python Hasan Diwan
@ 1998-10-30  2:40                 ` Alexandre Oliva
  0 siblings, 0 replies; 23+ messages in thread
From: Alexandre Oliva @ 1998-10-30  2:40 UTC (permalink / raw)
  To: Hasan Diwan; +Cc: egcs

Hasan Diwan <siliconjackal@iname.com> writes:

> /usr/include/__math.h: in function 'floatsleep':
> /usr/include/__math.h:277:'asm' needs too many reloads
> I will have a look at the offending header file when I get the time to do so. I am using glibc 2.0.6, if that is of concern.

I'm not sure which release of egcs you're using, but recent releases
have started to flag conditions that would cause incorrect code to be
silently generated for glibc.  You might want to upgrade glibc to a
newer release...

-- 
Alexandre Oliva
mailto:oliva@dcc.unicamp.br mailto:oliva@gnu.org mailto:aoliva@acm.org
http://www.dcc.unicamp.br/~oliva
Universidade Estadual de Campinas, SP, Brasil


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-10-23  2:57         ` Michael Hayes
  1998-10-23  2:57           ` Jeffrey A Law
@ 1998-11-04 20:47           ` Joern Rennecke
  1998-11-05  0:28             ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Joern Rennecke @ 1998-11-04 20:47 UTC (permalink / raw)
  To: Michael Hayes; +Cc: m.hayes, toon, rth, law, egcs

> I've come up with a better patch that tries to determine what the
> expressions for the loop initial and final values are if they are not
> constant.  If the initial and final expressions are the sum of an
> equivalent base register and some constants, then the initial and
> final values are replaced with the constants.  The number of loop
> iterations is now simply calculable and thus the loop unroll and loop
> BCT optimisations are happy.

I have the suspicion that this will miscompile this testcase:

int
f (unsigned start)
{
  int i, j;

  for (j = 0, i = start; i < start + 5; i++)
    j++;
  return j;
}

int
main ()
{
  int i;

  i = f (0U-1);
  if (i)
    abort ();
  exit (0);
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-11-05  0:28             ` Michael Hayes
@ 1998-11-04 20:47               ` Joern Rennecke
  1998-11-05  5:01                 ` Michael Hayes
  0 siblings, 1 reply; 23+ messages in thread
From: Joern Rennecke @ 1998-11-04 20:47 UTC (permalink / raw)
  To: Michael Hayes; +Cc: amylaar, m.hayes, toon, rth, law, egcs

> On a 2's complement machine, start is UINT_MAX, start + 5 is 4
> due to unsigned arithmetic, i is assigned -1, and j is assigned 0.
> The test fails immediately since an unsigned compare is performed
> and (unsigned int) -1 is not less than 4.  f() thus returns 0.

So what prevents yout patched compiler from unrolling he test case?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-11-04 20:47           ` Re-run of loop pass Joern Rennecke
@ 1998-11-05  0:28             ` Michael Hayes
  1998-11-04 20:47               ` Joern Rennecke
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Hayes @ 1998-11-05  0:28 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Michael Hayes, toon, rth, law, egcs

Joern Rennecke writes:
 > I have the suspicion that this will miscompile this testcase:
 > 
 > int
 > f (unsigned start)
 > {
 >   int i, j;
 > 
 >   for (j = 0, i = start; i < start + 5; i++)
 >     j++;
 >   return j;
 > }
 > 
 > int
 > main ()
 > {
 >   int i;
 > 
 >   i = f (0U-1);
 >   if (i)
 >     abort ();
 >   exit (0);
 > }

The generated code executes correctly on the C4x.

On a 2's complement machine, start is UINT_MAX, start + 5 is 4
due to unsigned arithmetic, i is assigned -1, and j is assigned 0.
The test fails immediately since an unsigned compare is performed
and (unsigned int) -1 is not less than 4.  f() thus returns 0.

Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re-run of loop pass
  1998-11-04 20:47               ` Joern Rennecke
@ 1998-11-05  5:01                 ` Michael Hayes
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1998-11-05  5:01 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Michael Hayes, toon, rth, law, egcs

Joern Rennecke writes:
 > > On a 2's complement machine, start is UINT_MAX, start + 5 is 4
 > > due to unsigned arithmetic, i is assigned -1, and j is assigned 0.
 > > The test fails immediately since an unsigned compare is performed
 > > and (unsigned int) -1 is not less than 4.  f() thus returns 0.
 > 
 > So what prevents yout patched compiler from unrolling he test case?

The loop gets completely unrolled but there is still a guard
instruction at the top of the loop to bypass the loop if necessary.

Here's the code for the c4x.

_f:
	ldiu	ar2,r0
	addi	5,r0
	cmpi3	r0,ar2
	bhsd	L3        <= unsigned delayed branch  (higher or same)
	ldiu	0,r1
	nop
	nop
	ldiu	5,r1
L3:
	ldiu	r1,r0
	rets

(Hmmm, looking at this code, there's a grand oppoprtunity for a
conditional load that was missed.)

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~1998-11-05  5:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-10-17 19:32 Re-run of loop pass Michael Hayes
1998-10-18  0:15 ` Jeffrey A Law
1998-10-18  3:11   ` Michael Hayes
1998-10-18 11:04     ` David Edelsohn
1998-10-18 12:43     ` Mark Mitchell
1998-10-18 18:52       ` Michael Hayes
1998-10-21 23:18   ` Richard Henderson
1998-10-22  9:28     ` Jeffrey A Law
1998-10-22 21:15     ` Toon Moene
1998-10-23  2:57       ` Michael Hayes
1998-10-23  2:57         ` Michael Hayes
1998-10-23  2:57           ` Jeffrey A Law
1998-10-23  2:57             ` Michael Hayes
1998-10-25 13:53               ` Problem compiling python Hasan Diwan
1998-10-30  2:40                 ` Alexandre Oliva
1998-11-04 20:47           ` Re-run of loop pass Joern Rennecke
1998-11-05  0:28             ` Michael Hayes
1998-11-04 20:47               ` Joern Rennecke
1998-11-05  5:01                 ` Michael Hayes
1998-10-23 17:45         ` Toon Moene
1998-10-23 17:45           ` Michael Hayes
1998-10-18  6:11 ` Toon Moene
1998-10-18 17:19   ` Michael Hayes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).