* -fssa kicks butt on alphaev6 ieee floating point code
@ 2000-03-16 7:46 Brad Lucier
2000-03-16 10:25 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Brad Lucier @ 2000-03-16 7:46 UTC (permalink / raw)
To: gcc; +Cc: feeley, hosking, lucier, nlucier
-fssa definitely helps IEEE floating-point register allocation on
alphaev6.
The executive summary: with
-mcpu=ev6 -fno-math-errno -fPIC -O2
the following test file was scheduled in the following number of clocks:
-fssa -fnossa
-mieee 29 60
(no ieee) 30(!) 48
The details: with this test file:
double
inner (double *x, double *y)
{
double r0, r1, r2, r3, r4, r5, r6, r7;
r0 = x[0]; r1 = y[0]; r0 = r0 * r1;
r1 = x[1]; r2 = y[1]; r1 = r1 * r2;
r2 = x[2]; r3 = y[2]; r2 = r2 * r3;
r3 = x[3]; r4 = y[3]; r3 = r3 * r4;
r4 = x[4]; r5 = y[4]; r4 = r4 * r5;
r5 = x[5]; r6 = y[5]; r5 = r5 * r6;
r6 = x[6]; r7 = y[6]; r6 = r6 * r7;
return (r1 + (r2 + (r3 + (r4 + (r5 + r6)))));
}
with gcc 20000315, compiled with
gcc -mcpu=ev6 -fno-math-errno -mieee -fPIC -O2 -S test.c
With -fssa, I get
.file 1 "test.c"
.set noat
.set noreorder
.arch ev6
.text
.align 5
.globl inner
.ent inner
inner:
.eflag 48
.frame $30,0,$26,0
$inner..ng:
.prologue 0
ldt $f11,48($16)
ldt $f10,48($17)
ldt $f14,40($16)
ldt $f12,40($17)
ldt $f22,32($17)
ldt $f28,32($16)
ldt $f13,16($16)
ldt $f26,24($16)
multsu $f11,$f10,$f27
ldt $f10,16($17)
ldt $f11,8($17)
multsu $f14,$f12,$f23
ldt $f25,24($17)
ldt $f24,8($16)
multsu $f28,$f22,$f14
multsu $f13,$f10,$f22
addtsu $f23,$f27,$f12
multsu $f26,$f25,$f15
multsu $f24,$f11,$f13
addtsu $f14,$f12,$f10
addtsu $f15,$f10,$f11
addtsu $f22,$f11,$f12
addtsu $f13,$f12,$f0
ret $31,($26),1
.end inner
.ident "GCC: (GNU) 2.96 20000315 (experimental)"
which is scheduled in 29 clocks; without -fssa, I get
.file 1 "test.c"
.set noat
.set noreorder
.arch ev6
.text
.align 5
.globl inner
.ent inner
inner:
.eflag 48
.frame $30,0,$26,0
$inner..ng:
.prologue 0
ldt $f22,8($17)
ldt $f13,8($16)
ldt $f12,16($17)
ldt $f15,24($17)
ldt $f14,32($17)
ldt $f10,40($17)
ldt $f11,48($17)
multsu $f13,$f22,$f23
ldt $f22,16($16)
fmov $f23,$f13
multsu $f22,$f12,$f23
ldt $f12,24($16)
fmov $f23,$f22
multsu $f12,$f15,$f23
ldt $f15,32($16)
fmov $f23,$f12
multsu $f15,$f14,$f23
ldt $f14,40($16)
fmov $f23,$f15
multsu $f14,$f10,$f23
ldt $f10,48($16)
fmov $f23,$f14
multsu $f10,$f11,$f23
fmov $f23,$f10
addtsu $f14,$f10,$f11
fmov $f11,$f10
addtsu $f15,$f10,$f11
addtsu $f12,$f11,$f23
fmov $f23,$f12
addtsu $f22,$f12,$f10
addtsu $f13,$f10,$f0
ret $31,($26),1
.end inner
.ident "GCC: (GNU) 2.96 20000315 (experimental)"
which is scheduled in 60 clocks.
So this is great! I did the same tests without -mieee, and there were
no fmovs, but the -fssa code was scheduled in 30 cycles (longer than the
-mieee code!), and the code without -fssa was scheduled in 48 cycles.
So now I run the test suite with -fssa enabled, if I can figure out
how to do it.
Brad Lucier
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
2000-03-16 7:46 -fssa kicks butt on alphaev6 ieee floating point code Brad Lucier
@ 2000-03-16 10:25 ` Richard Henderson
2000-03-16 10:38 ` Brad Lucier
0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2000-03-16 10:25 UTC (permalink / raw)
To: Brad Lucier; +Cc: gcc, feeley, hosking, nlucier
On Thu, Mar 16, 2000 at 10:43:23AM -0500, Brad Lucier wrote:
> -fssa definitely helps IEEE floating-point register allocation on
> alphaev6.
It's an accident. What it means is that our register allocator
really bites.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
2000-03-16 10:25 ` Richard Henderson
@ 2000-03-16 10:38 ` Brad Lucier
2000-03-16 11:48 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Brad Lucier @ 2000-03-16 10:38 UTC (permalink / raw)
To: Richard Henderson; +Cc: lucier, gcc, feeley
>
> On Thu, Mar 16, 2000 at 10:43:23AM -0500, Brad Lucier wrote:
> > -fssa definitely helps IEEE floating-point register allocation on
> > alphaev6.
>
> It's an accident. What it means is that our register allocator
> really bites.
I disagree completely.
The register allocator cannot deal well with the constraint
that a source operand cannot overlap a destination operand.
Translating the intermediate representation to a canonical
form (which is what translating to SSA and back again does)
to help the register allocator in this case seems a perfectly
reasonable strategy. (Which is why I thought of trying this
in the first place.)
Brad
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
2000-03-16 10:38 ` Brad Lucier
@ 2000-03-16 11:48 ` Richard Henderson
2000-03-16 12:01 ` Brad Lucier
0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2000-03-16 11:48 UTC (permalink / raw)
To: lucier; +Cc: gcc, feeley
On Thu, Mar 16, 2000 at 01:37:52PM -0500, Brad Lucier wrote:
> > It's an accident. What it means is that our register allocator
> > really bites.
>
> The register allocator cannot deal well with the constraint
> that a source operand cannot overlap a destination operand.
Yes. Which exactly proves my point.
A better register allocator would take these constraints into
account much earlier. A slightly smarter register allocator
that's not quite smart enough will happily undo any changes
that going toand from SSA form bought you. Combine will often
work to thwart you was well.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
2000-03-16 11:48 ` Richard Henderson
@ 2000-03-16 12:01 ` Brad Lucier
2000-03-16 12:04 ` Richard Henderson
2000-03-16 13:17 ` Jeffrey A Law
0 siblings, 2 replies; 9+ messages in thread
From: Brad Lucier @ 2000-03-16 12:01 UTC (permalink / raw)
To: Richard Henderson; +Cc: lucier, gcc, feeley
> A better register allocator would take these constraints into
> account much earlier. A slightly smarter register allocator
> that's not quite smart enough will happily undo any changes
> that going toand from SSA form bought you. Combine will often
> work to thwart you was well.
OK, but it didn't seem like anyone was going to rewrite the register
allocator anytime soon just so that IEEE FP arithmetic on the 21264 will
go faster, so it seems to me that adding -fssa is a reasonable strategy for
now.
Or are you saying that adding -fssa right now will not always produce
better code on the 264 with -mieee?
Brad
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
2000-03-16 12:01 ` Brad Lucier
@ 2000-03-16 12:04 ` Richard Henderson
2000-03-16 13:17 ` Jeffrey A Law
1 sibling, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2000-03-16 12:04 UTC (permalink / raw)
To: Brad Lucier; +Cc: gcc, feeley
On Thu, Mar 16, 2000 at 03:01:32PM -0500, Brad Lucier wrote:
> ... so it seems to me that adding -fssa is a reasonable strategy for now.
Probably.
> Or are you saying that adding -fssa right now will not always produce
> better code on the 264 with -mieee?
I have no idea. It doesn't seem likely that it would wind up
vastly worse.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
2000-03-16 12:01 ` Brad Lucier
2000-03-16 12:04 ` Richard Henderson
@ 2000-03-16 13:17 ` Jeffrey A Law
1 sibling, 0 replies; 9+ messages in thread
From: Jeffrey A Law @ 2000-03-16 13:17 UTC (permalink / raw)
To: Brad Lucier; +Cc: Richard Henderson, gcc, feeley
In message < 200003162001.PAA26193@polya.math.purdue.edu >you write:
> Or are you saying that adding -fssa right now will not always produce
> better code on the 264 with -mieee?
Correct. Sometimes it might do better, sometimes it might do worse. It
really depends on the code, it's critical path, the number of registers
it needs, etc etc etc.
It is rather rare in the compiler development to find any non-trivial
transformation that is always an improvement.
jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
@ 2000-03-23 18:59 Brad Lucier
0 siblings, 0 replies; 9+ messages in thread
From: Brad Lucier @ 2000-03-23 18:59 UTC (permalink / raw)
To: rth; +Cc: gcc, feeley, hosking, lucier
> From rth@cygnus.com Thu Mar 16 14:49:01 2000
> On Thu, Mar 16, 2000 at 01:37:52PM -0500, Brad Lucier wrote:
> > > It's an accident. What it means is that our register allocator
> > > really bites.
> >
> > The register allocator cannot deal well with the constraint
> > that a source operand cannot overlap a destination operand.
>
> Yes. Which exactly proves my point.
>
> A better register allocator would take these constraints into
> account much earlier. A slightly smarter register allocator
> that's not quite smart enough will happily undo any changes
> that going toand from SSA form bought you. Combine will often
> work to thwart you was well.
I don't want to belabor this point too much, but before trying the
-fssa flag, I thought about what the register allocator would need
to do to take these constraints into account earlier, and I thought
that the SSA canonical form was precisely the kind of transformation
the register allocator would need to generate better code. (I.e.,
the register allocator would have to be pessimizing to undo what SSA
does.) So, to my mind, it was no "accident" that it happened to work
(I'm not randomly trying flags to try to get better code generation).
It is really a shame that the -fssa transformations are buggy now (or else
expose bugs in other parts of the compiler), because it makes one hell of
a difference on some codes. For example, in a 3-D geometry code I have,
the composition of two 4x4 matrix transformations is compiled to 124
instructions that are scheduled in 70 cycles with -fssa, and compiled
to 223 instructions that are scheduled in 367 cycles without -fssa.
So we're talking a factor of 5 difference.
Brad
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: -fssa kicks butt on alphaev6 ieee floating point code
@ 2000-03-16 8:15 Kaveh R. Ghazi
0 siblings, 0 replies; 9+ messages in thread
From: Kaveh R. Ghazi @ 2000-03-16 8:15 UTC (permalink / raw)
To: gcc, lucier
> From: Brad Lucier <lucier@popov.math.purdue.edu>
>
> So now I run the test suite with -fssa enabled, if I can figure out
> how to do it.
> Brad Lucier
http://gcc.gnu.org/fom_serv/cache/20.html
http://gcc.gnu.org/fom_serv/cache/21.html
--
Kaveh R. Ghazi Engagement Manager / Project Services
ghazi@caip.rutgers.edu Qwest Internet Solutions
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2000-03-23 18:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-03-16 7:46 -fssa kicks butt on alphaev6 ieee floating point code Brad Lucier
2000-03-16 10:25 ` Richard Henderson
2000-03-16 10:38 ` Brad Lucier
2000-03-16 11:48 ` Richard Henderson
2000-03-16 12:01 ` Brad Lucier
2000-03-16 12:04 ` Richard Henderson
2000-03-16 13:17 ` Jeffrey A Law
2000-03-16 8:15 Kaveh R. Ghazi
2000-03-23 18:59 Brad Lucier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).