* Re-run of loop pass @ 1998-10-17 19:32 Michael Hayes 1998-10-18 0:15 ` Jeffrey A Law 1998-10-18 6:11 ` Toon Moene 0 siblings, 2 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-17 19:32 UTC (permalink / raw) To: egcs; +Cc: Michael Hayes What's the philosophy of rerunning the loop optimisation pass twice? I've noticed that this can generate poorer code when unrolling loops or using the bct optimisation if a BIV is eliminated on the first pass. On the second pass, loop_iterations is often unable to determine the iteration count for loops with limits that should be able to be determined at compile time. Loop unrolling or the bct optimisation then makes a dog's breakfast of things. Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-17 19:32 Re-run of loop pass Michael Hayes @ 1998-10-18 0:15 ` Jeffrey A Law 1998-10-18 3:11 ` Michael Hayes 1998-10-21 23:18 ` Richard Henderson 1998-10-18 6:11 ` Toon Moene 1 sibling, 2 replies; 23+ messages in thread From: Jeffrey A Law @ 1998-10-18 0:15 UTC (permalink / raw) To: Michael Hayes; +Cc: egcs In message < 13865.9178.455954.70561@ongaonga.elec.canterbury.ac.nz >you write: > I've noticed that this can generate poorer code when unrolling loops > or using the bct optimisation if a BIV is eliminated on the first > pass. > > On the second pass, loop_iterations is often unable to determine the > iteration count for loops with limits that should be able to be > determined at compile time. Loop unrolling or the bct optimisation > then makes a dog's breakfast of things. We've generally gotten better code. You should investigate why the second loop pass is unable to determine loop iteration counts. jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-18 0:15 ` Jeffrey A Law @ 1998-10-18 3:11 ` Michael Hayes 1998-10-18 11:04 ` David Edelsohn 1998-10-18 12:43 ` Mark Mitchell 1998-10-21 23:18 ` Richard Henderson 1 sibling, 2 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-18 3:11 UTC (permalink / raw) To: law; +Cc: Michael Hayes, egcs Jeffrey A Law writes: > You should investigate why the second loop pass is unable to determine > loop iteration counts. Consider a simple dot product function like the following: float foo(float *a, float *b) { int i; float result = 0.0; for (i = 0; i < 4; i++) result += a[i] * b[i]; return result; } The first pass often eliminates the BIV associated with i and replaces the end of loop test i < 4 with a comparison of the incremented pointer associated with a + i with a + 4 (I think the elimination is only performed for targets with autoincrement addressing modes). The second pass then finds that the initial value of BIV associated with a is not a constant and thus cannot determine the iteration count. I imagine that the information regarding iteration counts should be gathered on the first pass and then used on the second pass if needed. Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-18 3:11 ` Michael Hayes @ 1998-10-18 11:04 ` David Edelsohn 1998-10-18 12:43 ` Mark Mitchell 1 sibling, 0 replies; 23+ messages in thread From: David Edelsohn @ 1998-10-18 11:04 UTC (permalink / raw) To: Michael Hayes; +Cc: law, egcs >>>>> Michael Hayes writes: Michael> The first pass often eliminates the BIV associated with i and replaces Michael> the end of loop test i < 4 with a comparison of the incremented Michael> pointer associated with a + i with a + 4 (I think the elimination is Michael> only performed for targets with autoincrement addressing modes). The Michael> second pass then finds that the initial value of BIV associated with a Michael> is not a constant and thus cannot determine the iteration count. I think that this is addressed by the following comment in unroll.c:loop_iterations() /* ?? Final value and initial value do not have to be constants. Only their difference has to be constant. When the iteration variable is an array address, the final value and initial value might both be addresses with the same base but different constant offsets. Final value must be invariant for this to work. To do this, need some way to find the values of registers which are invariant. */ I guess this is not yet implemented. David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-18 3:11 ` Michael Hayes 1998-10-18 11:04 ` David Edelsohn @ 1998-10-18 12:43 ` Mark Mitchell 1998-10-18 18:52 ` Michael Hayes 1 sibling, 1 reply; 23+ messages in thread From: Mark Mitchell @ 1998-10-18 12:43 UTC (permalink / raw) To: m.hayes; +Cc: law, m.hayes, egcs >>>>> "Michael" == Michael Hayes <m.hayes@elec.canterbury.ac.nz> writes: Michael> The first pass often eliminates the BIV associated with i Michael> and replaces the end of loop test i < 4 with a comparison Michael> of the incremented pointer associated with a + i with a + Michael> 4 (I think the elimination is only performed for targets Michael> with autoincrement addressing modes). The second pass Michael> then finds that the initial value of BIV associated with Michael> a is not a constant and thus cannot determine the Michael> iteration count. Or the second pass could learn to recongize some of these idioms. For example, you seem to imply that if I write: int* i; int* j; for (i = j ; i < j + 4; ++j) ; say, that loop can't figure out the number of iterations. A little algebra might be helpful here, since this is a reasonably common idiom, especially in C++ using STL, and especially where `4' is replaced by `k'. So much so that if no-one else does this, I'll probably get around to it. -- Mark Mitchell mark@markmitchell.com Mark Mitchell Consulting http://www.markmitchell.com ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-18 12:43 ` Mark Mitchell @ 1998-10-18 18:52 ` Michael Hayes 0 siblings, 0 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-18 18:52 UTC (permalink / raw) To: mark; +Cc: m.hayes, toon, law, egcs Mark Mitchell writes: > Or the second pass could learn to recongize some of these idioms. > For example, you seem to imply that if I write: > > int* i; > int* j; > for (i = j ; i < j + 4; ++j) > ; > > say, that loop can't figure out the number of iterations. A little > algebra might be helpful here, since this is a reasonably common > idiom, especially in C++ using STL, and especially where `4' is > replaced by `k'. So much so that if no-one else does this, I'll > probably get around to it. OK, this patch (with maybe a little strengthening) should handle the very simple case. Index: unroll.c =================================================================== RCS file: /egcs/carton/cvsfiles/egcs/gcc/unroll.c,v retrieving revision 1.32 diff -c -3 -p -r1.32 unroll.c *** unroll.c 1998/10/14 09:02:52 1.32 --- unroll.c 1998/10/18 22:02:59 *************** loop_iterations (loop_start, loop_end) *** 3499,3504 **** --- 3499,3508 ---- if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST && CONSTANT_P (XEXP (note, 0))) comparison_value = XEXP (note, 0); + /* May have to check that we have the sum of a register + and a constant. */ + else if (GET_CODE (SET_SRC (set)) == PLUS) + comparison_value = SET_SRC (set); } break; } *************** loop_iterations (loop_start, loop_end) *** 3535,3544 **** } else if (GET_CODE (initial_value) != CONST_INT) { ! if (loop_dump_stream) ! fprintf (loop_dump_stream, ! "Loop unrolling: Initial value not constant.\n"); ! return 0; } else if (final_value == 0) { --- 3539,3559 ---- } else if (GET_CODE (initial_value) != CONST_INT) { ! if (REG_P (initial_value) ! && GET_CODE (final_value) == PLUS ! && XEXP (final_value, 0) == initial_value ! && CONSTANT_P (XEXP (final_value, 1))) ! { ! initial_value = const0_rtx; ! final_value = XEXP (final_value, 1); ! } ! else ! { ! if (loop_dump_stream) ! fprintf (loop_dump_stream, ! "Loop unrolling: Initial value not constant.\n"); ! return 0; ! } } else if (final_value == 0) { ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-18 0:15 ` Jeffrey A Law 1998-10-18 3:11 ` Michael Hayes @ 1998-10-21 23:18 ` Richard Henderson 1998-10-22 9:28 ` Jeffrey A Law 1998-10-22 21:15 ` Toon Moene 1 sibling, 2 replies; 23+ messages in thread From: Richard Henderson @ 1998-10-21 23:18 UTC (permalink / raw) To: law, Michael Hayes; +Cc: egcs On Sun, Oct 18, 1998 at 01:14:51AM -0600, Jeffrey A Law wrote: > We've generally gotten better code. Not lately, as far as I can tell. It was originally put in because it helped Fortran identify more GIVs -- probably just due to damebramage wrt move_movables. The bits I've looked at since this spring's loop enhancements did not show an improvement from re-running loop. Toon should probably comment on this. r~ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-21 23:18 ` Richard Henderson @ 1998-10-22 9:28 ` Jeffrey A Law 1998-10-22 21:15 ` Toon Moene 1 sibling, 0 replies; 23+ messages in thread From: Jeffrey A Law @ 1998-10-22 9:28 UTC (permalink / raw) To: Richard Henderson; +Cc: Michael Hayes, egcs In message < 19981021224801.A1763@dot.cygnus.com >you write: > On Sun, Oct 18, 1998 at 01:14:51AM -0600, Jeffrey A Law wrote: > > We've generally gotten better code. > > Not lately, as far as I can tell. I benchmarked it after your big loop change and it was still improving code on the PA. jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-21 23:18 ` Richard Henderson 1998-10-22 9:28 ` Jeffrey A Law @ 1998-10-22 21:15 ` Toon Moene 1998-10-23 2:57 ` Michael Hayes 1 sibling, 1 reply; 23+ messages in thread From: Toon Moene @ 1998-10-22 21:15 UTC (permalink / raw) To: Richard Henderson; +Cc: law, Michael Hayes, egcs [ Rerunning loop optimisation pass ... ] On Sun, Oct 18, 1998 at 01:14:51AM -0600, Jeffrey A Law wrote: > We've generally gotten better code. > Not lately, as far as I can tell. > > It was originally put in because it helped Fortran > identify more GIVs -- probably just due to damebramage > wrt move_movables. The bits I've looked at since this > spring's loop enhancements did not show an improvement > from re-running loop. > > Toon should probably comment on this. Well, I'd love to, but since over a week the CVS'd snapshot builds die over here when compiling _bb during stage 2. (Note that that is using the stage 2 compiler, so probably this one is miscompiled by the stage 1 compiler). The interesting thing about loop unrolling in this context is, that we deliberately moved it to *after* the second loop opt phase because it suffered badly from GIVs resulting from loop unrolling being found and reduced by the second loop opt pass. Now that GIV combining is in full force, this might not be necessary anymore: DO I = 1, N, 4 X(I ) = X(I ) + Y(I ) X(I+1) = X(I+1) + Y(I+1) X(I+2) = X(I+2) + Y(I+2) X(I+3) = X(I+3) + Y(I+3) ENDDO would previously lead to 8 registers being used for addressing, against 2 now. Cheers, Toon. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-22 21:15 ` Toon Moene @ 1998-10-23 2:57 ` Michael Hayes 1998-10-23 2:57 ` Michael Hayes 1998-10-23 17:45 ` Toon Moene 0 siblings, 2 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-23 2:57 UTC (permalink / raw) To: Toon Moene; +Cc: Richard Henderson, law, Michael Hayes, egcs Toon Moene writes: > [ Rerunning loop optimisation pass ... ] > > The interesting thing about loop unrolling in this context is, that > we deliberately moved it to *after* the second loop opt phase > because it suffered badly from GIVs resulting from loop unrolling > being found and reduced by the second loop opt pass. Now that GIV > combining is in full force, this might not be necessary anymore: > > DO I = 1, N, 4 > X(I ) = X(I ) + Y(I ) > X(I+1) = X(I+1) + Y(I+1) > X(I+2) = X(I+2) + Y(I+2) > X(I+3) = X(I+3) + Y(I+3) > ENDDO > > would previously lead to 8 registers being used for addressing, > against 2 now. I've done some experimentation with this code snippet. With -funroll-loops or BCT optimisation the generated code is _attrocious_ since the constant iteration count cannot be determined. The patch at the end seems to fix this problem (it could be generalised but for most cases this may not be necessary). Could someone test this on a host machine with autoincrement addressing (such as the 68k)? With my patch applied, I've observed the following address register usage: 6 addr regs -O2 -funroll-loops 4 addr regs -O2 -funroll-loops -fno-rerun-loop-opt 2 addr regs -O2 2 addr regs -O2 -fno-rerun-loop-opt Michael. Index: unroll.c =================================================================== RCS file: /egcs/carton/cvsfiles/egcs/gcc/unroll.c,v retrieving revision 1.32 diff -c -3 -p -r1.32 unroll.c *** unroll.c 1998/10/14 09:02:52 1.32 --- unroll.c 1998/10/22 23:00:52 *************** loop_iterations (loop_start, loop_end) *** 3499,3504 **** --- 3499,3508 ---- if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST && CONSTANT_P (XEXP (note, 0))) comparison_value = XEXP (note, 0); + /* May have to check that we have the sum of a register + and a constant. */ + else if (GET_CODE (SET_SRC (set)) == PLUS) + comparison_value = SET_SRC (set); } break; } *************** loop_iterations (loop_start, loop_end) *** 3508,3513 **** --- 3512,3581 ---- final_value = approx_final_value (comparison_code, comparison_value, &unsigned_compare, &compare_dir); + /* initial_value = reg1, final_value = reg2 + const + Try to find what reg1 is equivalent to. */ + if (REG_P (initial_value) + && GET_CODE (final_value) == PLUS + && CONSTANT_P (XEXP (final_value, 1)) + && XEXP (final_value, 0) != initial_value) + { + rtx insn, set; + + for (insn = PREV_INSN (loop_start); insn ; insn = PREV_INSN (insn)) + { + if (GET_CODE (insn) == CODE_LABEL) + break; + + else if (GET_RTX_CLASS (GET_CODE (insn)) == 'i' + && reg_set_p (initial_value, insn)) + { + /* We found the last insn before the loop that sets the + register. If it sets the entire register, and has a + REG_EQUAL note, then use, the value of the REG_EQUAL + note. */ + if ((set = single_set (insn)) + && (SET_DEST (set) == initial_value)) + { + rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX); + + /* Only use the REG_EQUAL note if it is a constant. + Other things, divide in particular, will cause + problems later if we use them. */ + if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST + && CONSTANT_P (XEXP (note, 0))) + initial_value = XEXP (note, 0); + /* May have to check that we have the sum of a register + and a constant. */ + else if (GET_CODE (SET_SRC (set)) == PLUS) + initial_value = SET_SRC (set); + } + break; + } + } + } + + /* initial_value = reg + const1, final_value = reg + const2 */ + if (GET_CODE (initial_value) == PLUS + && GET_CODE (final_value) == PLUS + && CONSTANT_P (XEXP (initial_value, 1)) + && CONSTANT_P (XEXP (final_value, 1)) + && XEXP (final_value, 0) == XEXP (initial_value, 0)) + { + final_value = GEN_INT (INTVAL (XEXP (final_value, 1)) + - INTVAL (XEXP (initial_value, 1))); + initial_value = const0_rtx; + } + + /* initial_value = reg, final_value = reg + const */ + if (REG_P (initial_value) + && GET_CODE (final_value) == PLUS + && CONSTANT_P (XEXP (final_value, 1)) + && XEXP (final_value, 0) == initial_value) + { + initial_value = const0_rtx; + final_value = XEXP (final_value, 1); + } + /* Save the calculated values describing this loop's bounds, in case precondition_loop_p will need them later. These values can not be recalculated inside precondition_loop_p because strength reduction ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-23 2:57 ` Michael Hayes @ 1998-10-23 2:57 ` Michael Hayes 1998-10-23 2:57 ` Jeffrey A Law 1998-11-04 20:47 ` Re-run of loop pass Joern Rennecke 1998-10-23 17:45 ` Toon Moene 1 sibling, 2 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-23 2:57 UTC (permalink / raw) To: Michael Hayes; +Cc: Toon Moene, Richard Henderson, law, egcs Michael Hayes writes: > I've done some experimentation with this code snippet. With > -funroll-loops or BCT optimisation the generated code is _attrocious_ > since the constant iteration count cannot be determined. The patch at > the end seems to fix this problem... I've come up with a better patch that tries to determine what the expressions for the loop initial and final values are if they are not constant. If the initial and final expressions are the sum of an equivalent base register and some constants, then the initial and final values are replaced with the constants. The number of loop iterations is now simply calculable and thus the loop unroll and loop BCT optimisations are happy. Michael. Index: unroll.c =================================================================== RCS file: /egcs/carton/cvsfiles/egcs/gcc/unroll.c,v retrieving revision 1.32 diff -c -3 -p -r1.32 unroll.c *** unroll.c 1998/10/14 09:02:52 1.32 --- unroll.c 1998/10/23 01:42:32 *************** final_giv_value (v, loop_start, loop_end *** 3403,3408 **** --- 3403,3445 ---- } + static rtx + loop_equiv_value (loop_start, val) + rtx loop_start; + rtx val; + { + rtx insn, set; + + for (insn = PREV_INSN (loop_start); insn ; insn = PREV_INSN (insn)) + { + if (GET_CODE (insn) == CODE_LABEL) + break; + + else if (GET_RTX_CLASS (GET_CODE (insn)) == 'i' + && reg_set_p (val, insn)) + { + /* We found the last insn before the loop that sets the register. + If it sets the entire register, and has a REG_EQUAL note, + then use the value of the REG_EQUAL note. */ + if ((set = single_set (insn)) + && (SET_DEST (set) == val)) + { + rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX); + + /* Only use the REG_EQUAL note if it is a constant. + Other things, divide in particular, will cause + problems later if we use them. */ + if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST + && CONSTANT_P (XEXP (note, 0))) + val = XEXP (note, 0); + val = SET_SRC (set); + } + break; + } + } + return val; + } + /* Calculate the number of loop iterations. Returns the exact number of loop iterations if it can be calculated, otherwise returns zero. */ *************** loop_iterations (loop_start, loop_end) *** 3474,3513 **** its value from the insns before the start of the loop. */ if (GET_CODE (comparison_value) == REG && invariant_p (comparison_value)) { ! rtx insn, set; ! ! for (insn = PREV_INSN (loop_start); insn ; insn = PREV_INSN (insn)) ! { ! if (GET_CODE (insn) == CODE_LABEL) ! break; ! else if (GET_RTX_CLASS (GET_CODE (insn)) == 'i' ! && reg_set_p (comparison_value, insn)) ! { ! /* We found the last insn before the loop that sets the register. ! If it sets the entire register, and has a REG_EQUAL note, ! then use the value of the REG_EQUAL note. */ ! if ((set = single_set (insn)) ! && (SET_DEST (set) == comparison_value)) ! { ! rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX); ! ! /* Only use the REG_EQUAL note if it is a constant. ! Other things, divide in particular, will cause ! problems later if we use them. */ ! if (note && GET_CODE (XEXP (note, 0)) != EXPR_LIST ! && CONSTANT_P (XEXP (note, 0))) ! comparison_value = XEXP (note, 0); ! } ! break; ! } } } ! final_value = approx_final_value (comparison_code, comparison_value, ! &unsigned_compare, &compare_dir); ! /* Save the calculated values describing this loop's bounds, in case precondition_loop_p will need them later. These values can not be recalculated inside precondition_loop_p because strength reduction --- 3511,3562 ---- its value from the insns before the start of the loop. */ if (GET_CODE (comparison_value) == REG && invariant_p (comparison_value)) + comparison_value = loop_equiv_value (loop_start, comparison_value); + + final_value = approx_final_value (comparison_code, comparison_value, + &unsigned_compare, &compare_dir); + + if (REG_P (initial_value)) { ! rtx temp = final_value; ! /* INITIAL_VALUE = reg1, FINAL_VALUE = reg2 + const, where reg1 != reg2. ! Try to find what reg1 is equivalent to. Hopefully it will either ! be reg2 or reg2 plus a constant. */ ! if (GET_CODE (temp) == PLUS) ! temp = XEXP (temp, 0); ! if (REG_P (temp) && REGNO (temp) != REGNO (initial_value)) ! initial_value = loop_equiv_value (loop_start, initial_value); ! } ! ! /* If have INITIAL_VALUE = reg + const1 and FINAL_VALUE = reg + const2, ! then replace INITIAL_VALUE with const1 and FINAL_VALUE with const2. */ ! if ((GET_CODE (initial_value) == REG || GET_CODE (initial_value) == PLUS) ! && (GET_CODE (final_value) == REG || GET_CODE (final_value) == PLUS)) ! { ! rtx init_val; ! rtx fini_val; ! rtx init_reg; ! rtx fini_reg; ! ! if (GET_CODE (initial_value) == PLUS) ! init_val = XEXP (initial_value, 1), init_reg = XEXP (initial_value, 0); ! else ! init_val = const0_rtx, init_reg = initial_value; ! ! if (GET_CODE (final_value) == PLUS) ! fini_val = XEXP (final_value, 1), fini_reg = XEXP (final_value, 0); ! else ! fini_val = const0_rtx, fini_reg = final_value; ! ! if (init_reg == fini_reg) ! { ! initial_value = init_val; ! final_value = fini_val; } } ! /* Save the calculated values describing this loop's bounds, in case precondition_loop_p will need them later. These values can not be recalculated inside precondition_loop_p because strength reduction *************** loop_iterations (loop_start, loop_end) *** 3554,3568 **** "Loop unrolling: Final value not constant.\n"); return 0; } - - /* ?? Final value and initial value do not have to be constants. - Only their difference has to be constant. When the iteration variable - is an array address, the final value and initial value might both - be addresses with the same base but different constant offsets. - Final value must be invariant for this to work. - - To do this, need some way to find the values of registers which are - invariant. */ /* Final_larger is 1 if final larger, 0 if they are equal, otherwise -1. */ if (unsigned_compare) --- 3603,3608 ---- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-23 2:57 ` Michael Hayes @ 1998-10-23 2:57 ` Jeffrey A Law 1998-10-23 2:57 ` Michael Hayes 1998-11-04 20:47 ` Re-run of loop pass Joern Rennecke 1 sibling, 1 reply; 23+ messages in thread From: Jeffrey A Law @ 1998-10-23 2:57 UTC (permalink / raw) To: Michael Hayes; +Cc: Toon Moene, Richard Henderson, egcs In message < 13871.53637.686739.360606@ongaonga.elec.canterbury.ac.nz >you writ e: > Michael Hayes writes: > > I've done some experimentation with this code snippet. With > > -funroll-loops or BCT optimisation the generated code is _attrocious_ > > since the constant iteration count cannot be determined. The patch at > > the end seems to fix this problem... > > I've come up with a better patch that tries to determine what the > expressions for the loop initial and final values are if they are not > constant. If the initial and final expressions are the sum of an > equivalent base register and some constants, then the initial and > final values are replaced with the constants. The number of loop > iterations is now simply calculable and thus the loop unroll and loop > BCT optimisations are happy. Does this latest version subsume your earlier unroll.c changes? [ Still catching up after being away for a few days... ] jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-23 2:57 ` Jeffrey A Law @ 1998-10-23 2:57 ` Michael Hayes 1998-10-25 13:53 ` Problem compiling python Hasan Diwan 0 siblings, 1 reply; 23+ messages in thread From: Michael Hayes @ 1998-10-23 2:57 UTC (permalink / raw) To: law; +Cc: Michael Hayes, Toon Moene, Richard Henderson, egcs Jeffrey A Law writes: > Does this latest version subsume your earlier unroll.c changes? Yep, this is the all-singing, all-dancing new one. It should patch against current unroll.c > [ Still catching up after being away for a few days... ] Things are little quiet when you go away! Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Problem compiling python 1998-10-23 2:57 ` Michael Hayes @ 1998-10-25 13:53 ` Hasan Diwan 1998-10-30 2:40 ` Alexandre Oliva 0 siblings, 1 reply; 23+ messages in thread From: Hasan Diwan @ 1998-10-25 13:53 UTC (permalink / raw) To: egcs Ladies and Gentlemen: I am having a problem compiling python versions 1.4, 1.5, and 1.5.1. I think it is due to egcs. The error is: /usr/include/__math.h: in function 'floatsleep': /usr/include/__math.h:277:'asm' needs too many reloads I will have a look at the offending header file when I get the time to do so. I am using glibc 2.0.6, if that is of concern. -- -----BEGIN PGP SIGNATURE----- Version: 2.6.3i iQCVAwUBNjOW3Q6sg/oMyISxAQH+RQP8D+XAt8+Fy7Yol8XJoi/dwd4IGuKEI9ur A853ynSWcnj/2zbtmOZZ34u8Fn/CKp0kLoReDWyF4wvS2j5queD1+7/JIdxkKVXn IYwZXEPtuCIyFwxKq/aZsOPDphFZgRdMY36wZfM1pBvmJBI0TUHdzBv6Cbhwqmnj T0W8R77Hijc= =jwz3 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Problem compiling python 1998-10-25 13:53 ` Problem compiling python Hasan Diwan @ 1998-10-30 2:40 ` Alexandre Oliva 0 siblings, 0 replies; 23+ messages in thread From: Alexandre Oliva @ 1998-10-30 2:40 UTC (permalink / raw) To: Hasan Diwan; +Cc: egcs Hasan Diwan <siliconjackal@iname.com> writes: > /usr/include/__math.h: in function 'floatsleep': > /usr/include/__math.h:277:'asm' needs too many reloads > I will have a look at the offending header file when I get the time to do so. I am using glibc 2.0.6, if that is of concern. I'm not sure which release of egcs you're using, but recent releases have started to flag conditions that would cause incorrect code to be silently generated for glibc. You might want to upgrade glibc to a newer release... -- Alexandre Oliva mailto:oliva@dcc.unicamp.br mailto:oliva@gnu.org mailto:aoliva@acm.org http://www.dcc.unicamp.br/~oliva Universidade Estadual de Campinas, SP, Brasil ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-23 2:57 ` Michael Hayes 1998-10-23 2:57 ` Jeffrey A Law @ 1998-11-04 20:47 ` Joern Rennecke 1998-11-05 0:28 ` Michael Hayes 1 sibling, 1 reply; 23+ messages in thread From: Joern Rennecke @ 1998-11-04 20:47 UTC (permalink / raw) To: Michael Hayes; +Cc: m.hayes, toon, rth, law, egcs > I've come up with a better patch that tries to determine what the > expressions for the loop initial and final values are if they are not > constant. If the initial and final expressions are the sum of an > equivalent base register and some constants, then the initial and > final values are replaced with the constants. The number of loop > iterations is now simply calculable and thus the loop unroll and loop > BCT optimisations are happy. I have the suspicion that this will miscompile this testcase: int f (unsigned start) { int i, j; for (j = 0, i = start; i < start + 5; i++) j++; return j; } int main () { int i; i = f (0U-1); if (i) abort (); exit (0); } ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-11-04 20:47 ` Re-run of loop pass Joern Rennecke @ 1998-11-05 0:28 ` Michael Hayes 1998-11-04 20:47 ` Joern Rennecke 0 siblings, 1 reply; 23+ messages in thread From: Michael Hayes @ 1998-11-05 0:28 UTC (permalink / raw) To: Joern Rennecke; +Cc: Michael Hayes, toon, rth, law, egcs Joern Rennecke writes: > I have the suspicion that this will miscompile this testcase: > > int > f (unsigned start) > { > int i, j; > > for (j = 0, i = start; i < start + 5; i++) > j++; > return j; > } > > int > main () > { > int i; > > i = f (0U-1); > if (i) > abort (); > exit (0); > } The generated code executes correctly on the C4x. On a 2's complement machine, start is UINT_MAX, start + 5 is 4 due to unsigned arithmetic, i is assigned -1, and j is assigned 0. The test fails immediately since an unsigned compare is performed and (unsigned int) -1 is not less than 4. f() thus returns 0. Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-11-05 0:28 ` Michael Hayes @ 1998-11-04 20:47 ` Joern Rennecke 1998-11-05 5:01 ` Michael Hayes 0 siblings, 1 reply; 23+ messages in thread From: Joern Rennecke @ 1998-11-04 20:47 UTC (permalink / raw) To: Michael Hayes; +Cc: amylaar, m.hayes, toon, rth, law, egcs > On a 2's complement machine, start is UINT_MAX, start + 5 is 4 > due to unsigned arithmetic, i is assigned -1, and j is assigned 0. > The test fails immediately since an unsigned compare is performed > and (unsigned int) -1 is not less than 4. f() thus returns 0. So what prevents yout patched compiler from unrolling he test case? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-11-04 20:47 ` Joern Rennecke @ 1998-11-05 5:01 ` Michael Hayes 0 siblings, 0 replies; 23+ messages in thread From: Michael Hayes @ 1998-11-05 5:01 UTC (permalink / raw) To: Joern Rennecke; +Cc: Michael Hayes, toon, rth, law, egcs Joern Rennecke writes: > > On a 2's complement machine, start is UINT_MAX, start + 5 is 4 > > due to unsigned arithmetic, i is assigned -1, and j is assigned 0. > > The test fails immediately since an unsigned compare is performed > > and (unsigned int) -1 is not less than 4. f() thus returns 0. > > So what prevents yout patched compiler from unrolling he test case? The loop gets completely unrolled but there is still a guard instruction at the top of the loop to bypass the loop if necessary. Here's the code for the c4x. _f: ldiu ar2,r0 addi 5,r0 cmpi3 r0,ar2 bhsd L3 <= unsigned delayed branch (higher or same) ldiu 0,r1 nop nop ldiu 5,r1 L3: ldiu r1,r0 rets (Hmmm, looking at this code, there's a grand oppoprtunity for a conditional load that was missed.) Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-23 2:57 ` Michael Hayes 1998-10-23 2:57 ` Michael Hayes @ 1998-10-23 17:45 ` Toon Moene 1998-10-23 17:45 ` Michael Hayes 1 sibling, 1 reply; 23+ messages in thread From: Toon Moene @ 1998-10-23 17:45 UTC (permalink / raw) To: Michael Hayes; +Cc: Richard Henderson, law, egcs I wrote: T > [ Rerunning loop optimisation pass ... ] T > The interesting thing about loop unrolling in this context is, that T > we deliberately moved it to *after* the second loop opt phase T > because it suffered badly from GIVs resulting from loop unrolling T > being found and reduced by the second loop opt pass. Now that GIV T > combining is in full force, this might not be necessary anymore: T > DO I = 1, N, 4 T > X(I ) = X(I ) + Y(I ) T > X(I+1) = X(I+1) + Y(I+1) T > X(I+2) = X(I+2) + Y(I+2) T > X(I+3) = X(I+3) + Y(I+3) T > ENDDO T > would previously lead to 8 registers being used for addressing, T > against 2 now. Michael: > I've done some experimentation with this code snippet. > With -funroll-loops or BCT optimisation the generated > code is _attrocious_ since the constant iteration count > cannot be determined. The patch at the end seems to fix > this problem (it could be generalised but for most cases > this may not be necessary). Could someone test this on > a host machine with autoincrement addressing (such as > the 68k)? I think we're talking at cross purposes - the above code doesn't have a constant iteration count. The iteration count is N / 4 iff N % 4 == 0 (in fact, the code as shown assumes this). The code was just meant to show why we're running loop unrolling *after* the second pass through loop opt ... If -funroll-loops nevertheless generates worse code, than that's due to something I haven't foreseen. HTH, Toon. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-23 17:45 ` Toon Moene @ 1998-10-23 17:45 ` Michael Hayes 0 siblings, 0 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-23 17:45 UTC (permalink / raw) To: Toon Moene; +Cc: Michael Hayes, Richard Henderson, law, egcs Toon Moene writes: > I think we're talking at cross purposes - the above code doesn't > have a constant iteration count. The iteration count is N / 4 iff N > % 4 == 0 (in fact, the code as shown assumes this). The code was > just meant to show why we're running loop unrolling *after* the > second pass through loop opt ... Oops, it's 15+ years since I used to program in Fortran. I assumed N was a constant. > If -funroll-loops nevertheless generates worse code, than that's > due to something I haven't foreseen. It only generates poorer code for loops with a constant iteration count on machines with autoincrement addressing when using loop unrolling or BCT optimisations. The same problem is exhibited on a single loop pass for examples such as void fcopy1(float *a, float *b) { float *c; for (c = a; c < a + 4; ) *c++ = *b++; } Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-17 19:32 Re-run of loop pass Michael Hayes 1998-10-18 0:15 ` Jeffrey A Law @ 1998-10-18 6:11 ` Toon Moene 1998-10-18 17:19 ` Michael Hayes 1 sibling, 1 reply; 23+ messages in thread From: Toon Moene @ 1998-10-18 6:11 UTC (permalink / raw) To: Michael Hayes; +Cc: egcs [ Because I invented this hack, it's appropriate I write an apology on it - however, don't expect a Platonesk Dialogue ] > What's the philosophy of rerunning the loop optimisation > pass twice? It's a poor man's device to cover up that the various *_giv routines in loop.c can't handle general induction variables of the form CONSTANT + INVARIANT + BIV * INVARIANT. To quote my own message to the egcs mailing list (before it went public - you won't find this in the archives): <QUOTE> 2. GIVs of the form (INVARIANT * BIV + INVARIANT + CONSTANT) are not recognised by strength reduction. These givs are generated in the address expressions resulting from accessing arrays that are not 0-based. Because Fortran arrays that are not explicitly declared as being 0-based are 1-based, strength reduction is not very effective for loops in Fortran (in fact, part of the above expression is reduced). Short term solution - run loop optimisation twice to get the expression completely reduced. Long term solution: Make the various *_giv_* routines in loop.c recognise the above expression as a giv - this might not even be hard for the person who _wrote_ those routines in the first place. </QUOTE> > I've noticed that this can generate poorer code when > unrolling loops or using the bct optimisation if a BIV > is eliminated on the first pass. I do not completely understand this remark. The sequence is: first-pass-through-loop; second-pass-through-loop; pass-through-unroll; bct-optimisations. How can the elimination of a BIV be so important then ? It can either happen in the first or the second pass through loop, but that shouldn't matter. > On the second pass, loop_iterations is often unable to > determine the iteration count for loops with limits that > should be able to be determined at compile time. Loop > unrolling or the bct optimisation then makes a dog's > breakfast of things. Do you have an example of this at hand ? I fail to see how this could be happening. Thanks, Toon. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Re-run of loop pass 1998-10-18 6:11 ` Toon Moene @ 1998-10-18 17:19 ` Michael Hayes 0 siblings, 0 replies; 23+ messages in thread From: Michael Hayes @ 1998-10-18 17:19 UTC (permalink / raw) To: Toon Moene; +Cc: Michael Hayes, egcs Toon Moene writes: > I do not completely understand this remark. The sequence is: > > first-pass-through-loop; second-pass-through-loop; > pass-through-unroll; bct-optimisations. > > How can the elimination of a BIV be so important then ? It can > either happen in the first or the second pass through loop, but that > shouldn't matter. > Do you have an example of this at hand ? I fail to see how this > could be happening. OK, here's a simple example void fcopy1(float *a, float *b) { int i; for (i = 0; i < 4; i++) a[i] = b[i]; } The loop on the first pass gets converted to for (p = a, q = b, r = a + 4; p < r; ) *p++ = *q++; The second pass then gives up trying to calculate the number of loop iterations since the initial value of p is not a constant. Now some algebra could be applied to determine the number of loop iterations but this would involve tracking register values. While it would be desirable to directly determine the number of loop iterations of such a modified loop, the easiest solution might be to cache the iteration counts computed on the first pass. As I mentioned previously, the problem appears with machine having autoincrement addressing modes. For example, compiling for the m68k generates (gcc -O2 -funroll-loops): .file "fcopy1.c" .version "01.01" gcc2_compiled.: .text .align 2 .globl fcopy1 .type fcopy1,@function fcopy1: link.w %a6,#0 movm.l #0x3030,-(%sp) move.l 8(%a6),%a2 move.l 12(%a6),%a3 moveq.l #12,%d2 add.l %a2,%d2 moveq.l #13,%d0 add.l %a2,%d0 moveq.l #13,%d1 cmp.l %a2,%d0 jble .L9 tst.l %d1 jbeq .L5 moveq.l #4,%d3 cmp.l %d1,%d3 jbge .L9 moveq.l #8,%d3 cmp.l %d1,%d3 jbge .L10 cmp.l %d1,%d1 jble .L5 move.l (%a3)+,(%a2)+ .L10: move.l (%a3)+,(%a2)+ .L9: move.l (%a3)+,(%a2)+ cmp.l %a2,%d2 jblt .L7 .align 2 .L5: move.l %a2,%a1 move.l %a3,%a0 move.l (%a0)+,(%a1)+ move.l (%a0),(%a1) move.l 8(%a3),8(%a2) move.l 12(%a3),12(%a2) lea (16,%a3),%a3 lea (16,%a2),%a2 cmp.l %a2,%d2 jbge .L5 .L7: movm.l (%sp)+,#0xc0c unlk %a6 rts .Lfe1: .size fcopy1,.Lfe1-fcopy1 .ident "GCC: (GNU) egcs-2.92.15 19981018 (gcc2 ss-980609 experimental)" ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~1998-11-05 5:01 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1998-10-17 19:32 Re-run of loop pass Michael Hayes 1998-10-18 0:15 ` Jeffrey A Law 1998-10-18 3:11 ` Michael Hayes 1998-10-18 11:04 ` David Edelsohn 1998-10-18 12:43 ` Mark Mitchell 1998-10-18 18:52 ` Michael Hayes 1998-10-21 23:18 ` Richard Henderson 1998-10-22 9:28 ` Jeffrey A Law 1998-10-22 21:15 ` Toon Moene 1998-10-23 2:57 ` Michael Hayes 1998-10-23 2:57 ` Michael Hayes 1998-10-23 2:57 ` Jeffrey A Law 1998-10-23 2:57 ` Michael Hayes 1998-10-25 13:53 ` Problem compiling python Hasan Diwan 1998-10-30 2:40 ` Alexandre Oliva 1998-11-04 20:47 ` Re-run of loop pass Joern Rennecke 1998-11-05 0:28 ` Michael Hayes 1998-11-04 20:47 ` Joern Rennecke 1998-11-05 5:01 ` Michael Hayes 1998-10-23 17:45 ` Toon Moene 1998-10-23 17:45 ` Michael Hayes 1998-10-18 6:11 ` Toon Moene 1998-10-18 17:19 ` Michael Hayes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).