public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Another performance regression
@ 2002-09-26 12:16 Dale Johannesen
  2002-09-26 16:07 ` Richard Henderson
  0 siblings, 1 reply; 3+ messages in thread
From: Dale Johannesen @ 2002-09-26 12:16 UTC (permalink / raw)
  To: gcc-patches, gcc; +Cc: Dale Johannesen

Try the program at the bottom with -O2 -funroll-loops.  Don't worry 
about the body
of the loops; that's only important insofar as it has the right amount 
of code
to cause the inner loop to be unrolled the right number of times, 
namely 2, with
1 left over.  The unroller generates some rather stupid code here:

           /* Calculate the difference between the final and initial 
values.
              Final value may be a (plus (reg x) (const_int 1)) rtx.
              Let the following cse pass simplify this if initial value 
is
              a constant.

(there's more to it besides the expression described above)
with the expectation that cse will clean it up.  However, the second 
pass of
loop optimization pulls some, but not all, of this code out of the
outer loop, with the effect that cse can't eliminate it.  On ppc, for 
example,
the beginning of the function looks like this:

         bge- cr0,L18        ; zero-trip check for outer loop
         li r0,1	            ; unnecessary
         cmpwi cr1,r0,0      ; unnecessary
         cmpwi cr6,r0,25     ; unnecessary
L16:                        ; top of outer loop
         slwi r0,r6,2
         li r8,0
         add r7,r0,r28
         mr r10,r29
         bge+ cr6,L22        ; always false
         beq- cr1,L15        ; always false
L22:
         ... single copy of inner loop body...
L15:
         ... two copies of inner loop body, executed 12 times...
         ble L15
         ...
         blt L16
L18:

I'm not entirely sure, but I think this patch was the culprit:

2002-07-21  Richard Henderson  <rth@redhat.com>

         * loop.h (LOOP_AUTO_UNROLL): Rename from LOOP_FIRST_PASS.
         * loop.c (strength_reduce): Update.
         * toplev.c (rest_of_compilation): Do unrolling in the first
         loop pass, not the second.

This didn't happen when unrolling was done last.

So should I fix this by making the unrolling code smarter, in effect
doing cse's job?  It seems likely Roger Sayle's approach of running
gcse after loop opts would Just Work.  Is that going to go in?


int foo(char *abcd00, int abcd01, char *abcd02, int *abcd03, 
int*abcd04) {
   int abcd05, abcd06, abcd07=0, abcd08=0, abcd09, abcd10, abcd11=0;
   for (abcd05=0;abcd05<abcd01;abcd05++) {
     for(abcd06=0;abcd06<25;abcd06++) {
       if(abcd00[abcd05]==abcd06 && abcd07<2) {
         if (abcd07==0) {
           abcd09=26*abcd03[abcd06]; abcd02[abcd08++]=abcd00[abcd05]; 
abcd07=1;
         } else if (abcd07==1) {
           abcd10=abcd09+abcd03[abcd06]; 
abcd02[abcd08++]=abcd00[abcd05]; abcd07=2;
         }
       }
       if(abcd07==2) {
         abcd04[abcd11++]=abcd10; abcd07=0;
         break;
       }
     }
   }
   return abcd07;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-09-27 22:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-26 12:16 Another performance regression Dale Johannesen
2002-09-26 16:07 ` Richard Henderson
2002-09-27 16:09   ` Dale Johannesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).