* Register renaming works - but sched2 doesn't profit from it ?!?!
@ 2000-06-11 5:37 Toon Moene
0 siblings, 0 replies; only message in thread
From: Toon Moene @ 2000-06-11 5:37 UTC (permalink / raw)
To: gcc
L.S.,
A long, long time ago I wrote to this list a series of optimisation
opportunities not realised by the (then egcs) compiler, of particular
importance to Fortran programs (Dec., 7, 1997).
One of them was the rescheduling of instructions after register renaming
in (unrolled) loops. Now that Stan Cox recently added a register
renaming pass, this should have been tackled.
Take, for instance, the following code:
subroutine sum(a, b, c, n)
integer i, n
real a(n), b(n), c(n)
do i = 1, n
c(i) = a(i) + b(i)
enddo
end
The current CVS'd compiler generates, for the unrolled loop, on
alphaev6-unknown-linux-gnu (-O2 -funroll-loops -fno-rerun-loop-opt):
$L6:
lds $f10,0($17)
lds $f11,0($16)
subl $4,3,$1
lda $4,-4($4)
addl $1,$31,$2
adds $f11,$f10,$f11
sts $f11,0($18)
lds $f11,4($17)
lds $f10,4($16)
adds $f10,$f11,$f10
sts $f10,4($18)
lds $f10,8($17)
lds $f11,8($16)
adds $f11,$f10,$f11
sts $f11,8($18)
lds $f10,12($16)
lda $16,16($16)
lds $f11,12($17)
lda $17,16($17)
adds $f10,$f11,$f10
sts $f10,12($18)
lda $18,16($18)
bge $2,$L6
and with -frename-registers:
$L6:
lds $f23,0($16)
lds $f12,0($17)
subl $4,3,$1
lda $4,-4($4)
addl $1,$31,$2
adds $f23,$f12,$f24
sts $f24,0($18)
lds $f13,4($16)
lds $f25,4($17)
adds $f13,$f25,$f14
sts $f14,4($18)
lds $f26,8($16)
lds $f15,8($17)
adds $f26,$f15,$f27
sts $f27,8($18)
lds $f22,12($16)
lda $16,16($16)
lds $f11,12($17)
lda $17,16($17)
adds $f22,$f11,$f10
sts $f10,12($18)
lda $18,16($18)
bge $2,$L6
Note how all the floating point registers are renamed (they are
temporaries within the loop anyway) - thereby breaking all the
dependency chains - but the sequence of instructions hasn't changed !
The following is close to optimal (derived by hand):
$L6:
lds $f23,0($16)
lds $f12,0($17)
lds $f13,4($16)
lds $f25,4($17)
lds $f26,8($16)
lds $f15,8($17)
lds $f22,12($16)
lds $f11,12($17)
subl $4,3,$1
lda $4,-4($4)
addl $1,$31,$2
adds $f23,$f12,$f24
adds $f13,$f25,$f14
adds $f26,$f15,$f27
adds $f22,$f11,$f10
lda $16,16($16)
lda $17,16($17)
sts $f24,0($18)
sts $f14,4($18)
sts $f27,8($18)
sts $f10,12($18)
lda $18,16($18)
bge $2,$L6
whereby all the loads are moved to the top of the loop, and all the
stores to the bottom. This is approximately 12 % faster on my 466 Mhz
21264. On a statically scheduled machine (e.g. the 21164(A)) the
difference should be dramatic.
Premium question: Why doesn't sched2 reschedule the instructions, in
spite of the completely different dependencies ?
--
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2000-06-11 5:37 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-11 5:37 Register renaming works - but sched2 doesn't profit from it ?!?! Toon Moene
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).