-fssa kicks butt on alphaev6 ieee floating point code

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* -fssa kicks butt on alphaev6 ieee floating point code
@ 2000-03-16  7:46 Brad Lucier
  2000-03-16 10:25 ` Richard Henderson
  0 siblings, 1 reply; 9+ messages in thread
From: Brad Lucier @ 2000-03-16  7:46 UTC (permalink / raw)
  To: gcc; +Cc: feeley, hosking, lucier, nlucier

-fssa definitely helps IEEE floating-point register allocation on
alphaev6.  

The executive summary: with 

-mcpu=ev6 -fno-math-errno -fPIC -O2

the following test file was scheduled in the following number of clocks:

			-fssa		-fnossa
-mieee			29		60
(no ieee)		30(!)		48

The details: with this test file:

double
inner (double *x, double *y)
{
  double r0, r1, r2, r3, r4, r5, r6, r7;
    r0 = x[0]; r1 = y[0]; r0 = r0 * r1;
    r1 = x[1]; r2 = y[1]; r1 = r1 * r2;
    r2 = x[2]; r3 = y[2]; r2 = r2 * r3;
    r3 = x[3]; r4 = y[3]; r3 = r3 * r4;
    r4 = x[4]; r5 = y[4]; r4 = r4 * r5;
    r5 = x[5]; r6 = y[5]; r5 = r5 * r6;
    r6 = x[6]; r7 = y[6]; r6 = r6 * r7;
    return (r1 + (r2 + (r3 + (r4 + (r5 + r6)))));
}

with gcc 20000315, compiled with

 gcc -mcpu=ev6 -fno-math-errno -mieee -fPIC -O2 -S test.c

With -fssa, I get

	.file	1 "test.c"
	.set noat
	.set noreorder
	.arch ev6
.text
	.align 5
	.globl inner
	.ent inner
inner:
	.eflag 48
	.frame $30,0,$26,0
$inner..ng:
	.prologue 0
	ldt $f11,48($16)
	ldt $f10,48($17)
	ldt $f14,40($16)
	ldt $f12,40($17)
	ldt $f22,32($17)
	ldt $f28,32($16)
	ldt $f13,16($16)
	ldt $f26,24($16)
	multsu $f11,$f10,$f27
	ldt $f10,16($17)
	ldt $f11,8($17)
	multsu $f14,$f12,$f23
	ldt $f25,24($17)
	ldt $f24,8($16)
	multsu $f28,$f22,$f14
	multsu $f13,$f10,$f22
	addtsu $f23,$f27,$f12
	multsu $f26,$f25,$f15
	multsu $f24,$f11,$f13
	addtsu $f14,$f12,$f10
	addtsu $f15,$f10,$f11
	addtsu $f22,$f11,$f12
	addtsu $f13,$f12,$f0
	ret $31,($26),1
	.end inner
	.ident	"GCC: (GNU) 2.96 20000315 (experimental)"

which is scheduled in 29 clocks; without -fssa, I get

	.file	1 "test.c"
	.set noat
	.set noreorder
	.arch ev6
.text
	.align 5
	.globl inner
	.ent inner
inner:
	.eflag 48
	.frame $30,0,$26,0
$inner..ng:
	.prologue 0
	ldt $f22,8($17)
	ldt $f13,8($16)
	ldt $f12,16($17)
	ldt $f15,24($17)
	ldt $f14,32($17)
	ldt $f10,40($17)
	ldt $f11,48($17)
	multsu $f13,$f22,$f23
	ldt $f22,16($16)
	fmov $f23,$f13
	multsu $f22,$f12,$f23
	ldt $f12,24($16)
	fmov $f23,$f22
	multsu $f12,$f15,$f23
	ldt $f15,32($16)
	fmov $f23,$f12
	multsu $f15,$f14,$f23
	ldt $f14,40($16)
	fmov $f23,$f15
	multsu $f14,$f10,$f23
	ldt $f10,48($16)
	fmov $f23,$f14
	multsu $f10,$f11,$f23
	fmov $f23,$f10
	addtsu $f14,$f10,$f11
	fmov $f11,$f10
	addtsu $f15,$f10,$f11
	addtsu $f12,$f11,$f23
	fmov $f23,$f12
	addtsu $f22,$f12,$f10
	addtsu $f13,$f10,$f0
	ret $31,($26),1
	.end inner
	.ident	"GCC: (GNU) 2.96 20000315 (experimental)"

which is scheduled in 60 clocks.

So this is great! I did the same tests without -mieee, and there were
no fmovs, but the -fssa code was scheduled in 30 cycles (longer than the
-mieee code!), and the code without -fssa was scheduled in 48 cycles.

So now I run the test suite with -fssa enabled, if I can figure out
how to do it.

Brad Lucier

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
  2000-03-16  7:46 -fssa kicks butt on alphaev6 ieee floating point code Brad Lucier
@ 2000-03-16 10:25 ` Richard Henderson
  2000-03-16 10:38   ` Brad Lucier
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2000-03-16 10:25 UTC (permalink / raw)
  To: Brad Lucier; +Cc: gcc, feeley, hosking, nlucier

On Thu, Mar 16, 2000 at 10:43:23AM -0500, Brad Lucier wrote:
> -fssa definitely helps IEEE floating-point register allocation on
> alphaev6.  

It's an accident.  What it means is that our register allocator
really bites.


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
  2000-03-16 10:25 ` Richard Henderson
@ 2000-03-16 10:38   ` Brad Lucier
  2000-03-16 11:48     ` Richard Henderson
  0 siblings, 1 reply; 9+ messages in thread
From: Brad Lucier @ 2000-03-16 10:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: lucier, gcc, feeley

> 
> On Thu, Mar 16, 2000 at 10:43:23AM -0500, Brad Lucier wrote:
> > -fssa definitely helps IEEE floating-point register allocation on
> > alphaev6.  
> 
> It's an accident.  What it means is that our register allocator
> really bites.

I disagree completely.  

The register allocator cannot deal well with the constraint
that a source operand cannot overlap a destination operand.
Translating the intermediate representation to a canonical
form (which is what translating to SSA and back again does)
to help the register allocator in this case seems a perfectly
reasonable strategy.  (Which is why I thought of trying this
in the first place.)

Brad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
  2000-03-16 10:38   ` Brad Lucier
@ 2000-03-16 11:48     ` Richard Henderson
  2000-03-16 12:01       ` Brad Lucier
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2000-03-16 11:48 UTC (permalink / raw)
  To: lucier; +Cc: gcc, feeley

On Thu, Mar 16, 2000 at 01:37:52PM -0500, Brad Lucier wrote:
> > It's an accident.  What it means is that our register allocator
> > really bites.
> 
> The register allocator cannot deal well with the constraint
> that a source operand cannot overlap a destination operand.

Yes.  Which exactly proves my point.

A better register allocator would take these constraints into
account much earlier.  A slightly smarter register allocator
that's not quite smart enough will happily undo any changes
that going toand from SSA form bought you.  Combine will often
work to thwart you was well.

r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
  2000-03-16 11:48     ` Richard Henderson
@ 2000-03-16 12:01       ` Brad Lucier
  2000-03-16 12:04         ` Richard Henderson
  2000-03-16 13:17         ` Jeffrey A Law
  0 siblings, 2 replies; 9+ messages in thread
From: Brad Lucier @ 2000-03-16 12:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: lucier, gcc, feeley

> A better register allocator would take these constraints into
> account much earlier.  A slightly smarter register allocator
> that's not quite smart enough will happily undo any changes
> that going toand from SSA form bought you.  Combine will often
> work to thwart you was well.

OK, but it didn't seem like anyone was going to rewrite the register
allocator anytime soon just so that IEEE FP arithmetic on the 21264 will
go faster, so it seems to me that adding -fssa is a reasonable strategy for 
now.

Or are you saying that adding -fssa right now will not always produce
better code on the 264 with -mieee?

Brad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
  2000-03-16 12:01       ` Brad Lucier
@ 2000-03-16 12:04         ` Richard Henderson
  2000-03-16 13:17         ` Jeffrey A Law
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2000-03-16 12:04 UTC (permalink / raw)
  To: Brad Lucier; +Cc: gcc, feeley

On Thu, Mar 16, 2000 at 03:01:32PM -0500, Brad Lucier wrote:
> ... so it seems to me that adding -fssa is a reasonable strategy for now.

Probably.

> Or are you saying that adding -fssa right now will not always produce
> better code on the 264 with -mieee?

I have no idea.  It doesn't seem likely that it would wind up
vastly worse.


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
  2000-03-16 12:01       ` Brad Lucier
  2000-03-16 12:04         ` Richard Henderson
@ 2000-03-16 13:17         ` Jeffrey A Law
  1 sibling, 0 replies; 9+ messages in thread
From: Jeffrey A Law @ 2000-03-16 13:17 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Richard Henderson, gcc, feeley

  In message < 200003162001.PAA26193@polya.math.purdue.edu >you write:
  > Or are you saying that adding -fssa right now will not always produce
  > better code on the 264 with -mieee?
Correct.  Sometimes it might do better, sometimes it might do worse.  It
really depends on the code, it's critical path, the number of registers
it needs, etc etc etc.

It is rather rare in the compiler development to find any non-trivial
transformation that is always an improvement.

jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
@ 2000-03-23 18:59 Brad Lucier
  0 siblings, 0 replies; 9+ messages in thread
From: Brad Lucier @ 2000-03-23 18:59 UTC (permalink / raw)
  To: rth; +Cc: gcc, feeley, hosking, lucier

> From rth@cygnus.com  Thu Mar 16 14:49:01 2000
> On Thu, Mar 16, 2000 at 01:37:52PM -0500, Brad Lucier wrote:
> > > It's an accident.  What it means is that our register allocator
> > > really bites.
> > 
> > The register allocator cannot deal well with the constraint
> > that a source operand cannot overlap a destination operand.
> 
> Yes.  Which exactly proves my point.
> 
> A better register allocator would take these constraints into
> account much earlier.  A slightly smarter register allocator
> that's not quite smart enough will happily undo any changes
> that going toand from SSA form bought you.  Combine will often
> work to thwart you was well.

I don't want to belabor this point too much, but before trying the
-fssa flag, I thought about what the register allocator would need
to do to take these constraints into account earlier, and I thought
that the SSA canonical form was precisely the kind of transformation
the register allocator would need to generate better code.  (I.e.,
the register allocator would have to be pessimizing to undo what SSA
does.) So, to my mind, it was no "accident" that it happened to work
(I'm not randomly trying flags to try to get better code generation).

It is really a shame that the -fssa transformations are buggy now (or else
expose bugs in other parts of the compiler), because it makes one hell of
a difference on some codes.  For example, in a 3-D geometry code I have,
the composition of two 4x4 matrix transformations is compiled to 124
instructions that are scheduled in 70 cycles with -fssa, and compiled
to 223 instructions that are scheduled in 367 cycles without -fssa.
So we're talking a factor of 5 difference.

Brad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: -fssa kicks butt on alphaev6 ieee floating point code
@ 2000-03-16  8:15 Kaveh R. Ghazi
  0 siblings, 0 replies; 9+ messages in thread
From: Kaveh R. Ghazi @ 2000-03-16  8:15 UTC (permalink / raw)
  To: gcc, lucier

 > From: Brad Lucier <lucier@popov.math.purdue.edu>
 > 
 > So now I run the test suite with -fssa enabled, if I can figure out
 > how to do it.
 > Brad Lucier

http://gcc.gnu.org/fom_serv/cache/20.html
http://gcc.gnu.org/fom_serv/cache/21.html

--
Kaveh R. Ghazi			Engagement Manager / Project Services
ghazi@caip.rutgers.edu		Qwest Internet Solutions

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2000-03-23 18:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-03-16  7:46 -fssa kicks butt on alphaev6 ieee floating point code Brad Lucier
2000-03-16 10:25 ` Richard Henderson
2000-03-16 10:38   ` Brad Lucier
2000-03-16 11:48     ` Richard Henderson
2000-03-16 12:01       ` Brad Lucier
2000-03-16 12:04         ` Richard Henderson
2000-03-16 13:17         ` Jeffrey A Law
2000-03-16  8:15 Kaveh R. Ghazi
2000-03-23 18:59 Brad Lucier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).