IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
@ 2011-07-28  8:11 Sandra Loosemore
  2011-07-28 14:24 ` Richard Guenther
  2011-07-29 18:57 ` Vladimir Makarov
  0 siblings, 2 replies; 6+ messages in thread
From: Sandra Loosemore @ 2011-07-28  8:11 UTC (permalink / raw)
  To: gcc

Consider this bit of code:

extern double a[20];

double test1 (int n)
{
   double accum = 0.0;
   int i;

   for (i=0; i<n; i++) { accum -= a[i]; }
   accum = fabs (accum);
   return accum;
}

which is compiled for MIPS using

mipsisa32r2-sde-elf-gcc -O3 -fno-inline -fno-unroll-loops -march=74kf1_1 
-S abstest.c

With a GCC 4.6 compiler, this produces:
...
.L3:
	mtc1	$3,$f2
	ldc1	$f0,0($5)
	addiu	$5,$5,8
	mtc1	$2,$f3
	sub.d	$f2,$f2,$f0
	mfc1	$3,$f2
	bne	$5,$4,.L3
	mfc1	$2,$f3

	ext	$5,$2,0,31
	move	$4,$3
.L2:
	mtc1	$4,$f0
	j	$31
	mtc1	$5,$f1
...

This is terrible code, with all that pointless register-shuffling inside 
the loop -- what's gone wrong?  Well, the bit-twiddling expansion of 
"fabs" produced by optabs.c uses subreg expressions, and on MIPS 
CANNOT_CHANGE_MODE_CLASS disallows use of FP registers for integer 
operations.  And, when IRA sees that, it decides it cannot alloc "accum" 
to a FP reg at all, even if it obviously makes sense to put it there for 
the rest of its lifetime.

On mainline trunk, things are even worse as it's spilling to memory, not 
just shuffling between registers:

.L3:
	ldc1	$f0,0($2)
	addiu	$2,$2,8
	sub.d	$f2,$f2,$f0
	bne	$2,$3,.L3
	sdc1	$f2,0($sp)

	lw	$2,0($sp)
	ext	$3,$2,0,31
	lw	$2,4($sp)
.L2:
	sw	$2,4($sp)
	sw	$3,0($sp)
	lw	$3,4($sp)
	lw	$2,0($sp)
	addiu	$sp,$sp,8
	mtc1	$3,$f0
	j	$31
	mtc1	$2,$f1

I've been experimenting with a patch to the MIPS backend to add 
define_insn_and_split patterns for floating-point abs -- the idea is to 
attach some constraints to the insns to tell IRA it needs a GP reg for 
these operations, so it can apply its usual cost analysis and reload 
logic instead of giving up.  Then the split to introduce the subreg 
expansion happens after reload when we already have the right register 
class.  This seems to work well enough on 4.6; for this particular 
example, I'm getting:

.L3:
	ldc1	$f2,0($2)
	addiu	$2,$2,8
	bne	$2,$4,.L3
	sub.d	$f0,$f0,$f2

	mfc1	$2,$f1
	ext	$2,$2,0,31
	j	$31
	mtc1	$2,$f1

However, same patch on mainline is still giving spills to memory.  :-(

So, here's my question.  Is it worthwhile for me to continue this 
approach of trying to make the MIPS backend smarter?  Or is the way IRA 
deals with CANNOT_CHANGE_MODE_CLASS fundamentally broken and in need of 
fixing in a target-inspecific way?  And/or is there some other 
regression in IRA on mainline that's causing it to spill to memory when 
it didn't used to in 4.6?

BTW, the unary "neg" operator has the same problem as "abs" on MIPS; 
can't use the hardware instruction because it does the wrong thing with 
NaNs, and can't twiddle the sign bit directly in a FP register.  With 
both abs/neg now generating unnecessary memory spills, this seems like a 
fairly important performance regression....

-Sandra

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
  2011-07-28  8:11 IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions? Sandra Loosemore
@ 2011-07-28 14:24 ` Richard Guenther
  2011-07-28 14:47   ` Sandra Loosemore
  2011-07-29 18:57 ` Vladimir Makarov
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Guenther @ 2011-07-28 14:24 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: gcc

On Wed, Jul 27, 2011 at 11:59 PM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
> Consider this bit of code:
>
> extern double a[20];
>
> double test1 (int n)
> {
>  double accum = 0.0;
>  int i;
>
>  for (i=0; i<n; i++) { accum -= a[i]; }
>  accum = fabs (accum);
>  return accum;
> }
>
> which is compiled for MIPS using
>
> mipsisa32r2-sde-elf-gcc -O3 -fno-inline -fno-unroll-loops -march=74kf1_1 -S
> abstest.c
>
> With a GCC 4.6 compiler, this produces:
> ...
> .L3:
>        mtc1    $3,$f2
>        ldc1    $f0,0($5)
>        addiu   $5,$5,8
>        mtc1    $2,$f3
>        sub.d   $f2,$f2,$f0
>        mfc1    $3,$f2
>        bne     $5,$4,.L3
>        mfc1    $2,$f3
>
>        ext     $5,$2,0,31
>        move    $4,$3
> .L2:
>        mtc1    $4,$f0
>        j       $31
>        mtc1    $5,$f1
> ...
>
> This is terrible code, with all that pointless register-shuffling inside the
> loop -- what's gone wrong?  Well, the bit-twiddling expansion of "fabs"
> produced by optabs.c uses subreg expressions, and on MIPS
> CANNOT_CHANGE_MODE_CLASS disallows use of FP registers for integer
> operations.  And, when IRA sees that, it decides it cannot alloc "accum" to
> a FP reg at all, even if it obviously makes sense to put it there for the
> rest of its lifetime.
>
> On mainline trunk, things are even worse as it's spilling to memory, not
> just shuffling between registers:
>
> .L3:
>        ldc1    $f0,0($2)
>        addiu   $2,$2,8
>        sub.d   $f2,$f2,$f0
>        bne     $2,$3,.L3
>        sdc1    $f2,0($sp)
>
>        lw      $2,0($sp)
>        ext     $3,$2,0,31
>        lw      $2,4($sp)
> .L2:
>        sw      $2,4($sp)
>        sw      $3,0($sp)
>        lw      $3,4($sp)
>        lw      $2,0($sp)
>        addiu   $sp,$sp,8
>        mtc1    $3,$f0
>        j       $31
>        mtc1    $2,$f1
>
> I've been experimenting with a patch to the MIPS backend to add
> define_insn_and_split patterns for floating-point abs -- the idea is to
> attach some constraints to the insns to tell IRA it needs a GP reg for these
> operations, so it can apply its usual cost analysis and reload logic instead
> of giving up.  Then the split to introduce the subreg expansion happens
> after reload when we already have the right register class.  This seems to
> work well enough on 4.6; for this particular example, I'm getting:
>
> .L3:
>        ldc1    $f2,0($2)
>        addiu   $2,$2,8
>        bne     $2,$4,.L3
>        sub.d   $f0,$f0,$f2
>
>        mfc1    $2,$f1
>        ext     $2,$2,0,31
>        j       $31
>        mtc1    $2,$f1
>
> However, same patch on mainline is still giving spills to memory.  :-(
>
> So, here's my question.  Is it worthwhile for me to continue this approach
> of trying to make the MIPS backend smarter?  Or is the way IRA deals with
> CANNOT_CHANGE_MODE_CLASS fundamentally broken and in need of fixing in a
> target-inspecific way?  And/or is there some other regression in IRA on
> mainline that's causing it to spill to memory when it didn't used to in 4.6?
>
> BTW, the unary "neg" operator has the same problem as "abs" on MIPS; can't
> use the hardware instruction because it does the wrong thing with NaNs, and
> can't twiddle the sign bit directly in a FP register.  With both abs/neg now
> generating unnecessary memory spills, this seems like a fairly important
> performance regression....

It sounds like IRA would benefit from properly split live-ranges here.
 You could try
to make the fabs optabs magic make sure to use a new pseudo (well, and hope
that survives ...)

Richard.

> -Sandra
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
  2011-07-28 14:24 ` Richard Guenther
@ 2011-07-28 14:47   ` Sandra Loosemore
  2011-07-28 22:47     ` Jeff Law
  0 siblings, 1 reply; 6+ messages in thread
From: Sandra Loosemore @ 2011-07-28 14:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

On 07/28/2011 02:11 AM, Richard Guenther wrote:
> On Wed, Jul 27, 2011 at 11:59 PM, Sandra Loosemore
> <sandra@codesourcery.com>  wrote:
>> [snip]
>>
>> So, here's my question.  Is it worthwhile for me to continue this approach
>> of trying to make the MIPS backend smarter?  Or is the way IRA deals with
>> CANNOT_CHANGE_MODE_CLASS fundamentally broken and in need of fixing in a
>> target-inspecific way?  And/or is there some other regression in IRA on
>> mainline that's causing it to spill to memory when it didn't used to in 4.6?
>>
> It sounds like IRA would benefit from properly split live-ranges here.
>   You could try
> to make the fabs optabs magic make sure to use a new pseudo (well, and hope
> that survives ...)
>

That was actually the first thing I tried, and it didn't work -- the new 
pseudo was eliminated well before it got to the RA.

I was thinking vaguely that this could be fixed in IRA by having an 
additional target hook to supply a reload register class to try for 
cases where CANNOT_CHANGE_MODE_CLASS is true (typically GENERAL_REGS), 
and from there making it treat this case the same as any other where it 
has to reload to satisfy the constraints of some particular instruction. 
  Given that I know nothing about IRA at this point beyond the little 
bit I picked up when trying to analyze this problem, that seems scarier 
than just fixing it in the MIPS backend, but certainly not as scary as 
adding stuff to split live ranges to IRA.  :-)

Anyone else have thoughts about the best way to proceed here?

-Sandra

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
  2011-07-28 14:47   ` Sandra Loosemore
@ 2011-07-28 22:47     ` Jeff Law
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Law @ 2011-07-28 22:47 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: Richard Guenther, gcc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/28/11 08:33, Sandra Loosemore wrote:
> 

> I picked up when trying to analyze this problem, that seems scarier than
> just fixing it in the MIPS backend, but certainly not as scary as adding
> stuff to split live ranges to IRA.  :-)
Splitting live ranges in IRA is pretty easy as is getting IRA to
allocate hard regs for the split ranges.  What's hard is getting the
costing models properly updated after the range split without rescanning
the entire function.

My range splitting code works on BBs/EBBs boundaries, but it wouldn't be
that hard to add splitting within a BB.


jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOMXZNAAoJEBRtltQi2kC7HIMH/AtqJol9OpquhnN1VFCI5tyY
BC7GeqMeKU0a1Xz/qUONnK9CZghyVM4S6wqRsC5ZsmGjlDN6zjkGTNFNbM6hFxI3
iTkTz0Etf/M+tjmE89Kbyp0tq68UZuZ69UPDBX+cIZJKuzsvnnzjGDYgeUTns5Fw
+UqOzQ5PriPqsCJfdG+jjtbgyIIe6X/NChlVn3TjfT/XYuZwsgGK5IDUB7/bX0t2
YCvf/NggXJVcH7419ShiBMTD4zGe9gqYpLfn8yIdL6wQuf2xkaW3UcmmljqUQ6dp
Lp0MyBlmtiyzwF0sjkujh0B3IF2k9eXPjQ7m1l7dewfO9yAAEl+euh3ufXO5n/E=
=CP28
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
  2011-07-28  8:11 IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions? Sandra Loosemore
  2011-07-28 14:24 ` Richard Guenther
@ 2011-07-29 18:57 ` Vladimir Makarov
  2011-08-01 20:03   ` Sandra Loosemore
  1 sibling, 1 reply; 6+ messages in thread
From: Vladimir Makarov @ 2011-07-29 18:57 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: gcc

On 07/27/2011 05:59 PM, Sandra Loosemore wrote:
> Consider this bit of code:
>
> extern double a[20];
>
> double test1 (int n)
> {
>   double accum = 0.0;
>   int i;
>
>   for (i=0; i<n; i++) { accum -= a[i]; }
>   accum = fabs (accum);
>   return accum;
> }
>
> which is compiled for MIPS using
>
> mipsisa32r2-sde-elf-gcc -O3 -fno-inline -fno-unroll-loops 
> -march=74kf1_1 -S abstest.c
>
> With a GCC 4.6 compiler, this produces:
> ...
> .L3:
>     mtc1    $3,$f2
>     ldc1    $f0,0($5)
>     addiu    $5,$5,8
>     mtc1    $2,$f3
>     sub.d    $f2,$f2,$f0
>     mfc1    $3,$f2
>     bne    $5,$4,.L3
>     mfc1    $2,$f3
>
>     ext    $5,$2,0,31
>     move    $4,$3
> .L2:
>     mtc1    $4,$f0
>     j    $31
>     mtc1    $5,$f1
> ...
>
> This is terrible code, with all that pointless register-shuffling 
> inside the loop -- what's gone wrong?  Well, the bit-twiddling 
> expansion of "fabs" produced by optabs.c uses subreg expressions, and 
> on MIPS CANNOT_CHANGE_MODE_CLASS disallows use of FP registers for 
> integer operations.  And, when IRA sees that, it decides it cannot 
> alloc "accum" to a FP reg at all, even if it obviously makes sense to 
> put it there for the rest of its lifetime.
>
> On mainline trunk, things are even worse as it's spilling to memory, 
> not just shuffling between registers:
>
> .L3:
>     ldc1    $f0,0($2)
>     addiu    $2,$2,8
>     sub.d    $f2,$f2,$f0
>     bne    $2,$3,.L3
>     sdc1    $f2,0($sp)
>
>     lw    $2,0($sp)
>     ext    $3,$2,0,31
>     lw    $2,4($sp)
> .L2:
>     sw    $2,4($sp)
>     sw    $3,0($sp)
>     lw    $3,4($sp)
>     lw    $2,0($sp)
>     addiu    $sp,$sp,8
>     mtc1    $3,$f0
>     j    $31
>     mtc1    $2,$f1
>
> I've been experimenting with a patch to the MIPS backend to add 
> define_insn_and_split patterns for floating-point abs -- the idea is 
> to attach some constraints to the insns to tell IRA it needs a GP reg 
> for these operations, so it can apply its usual cost analysis and 
> reload logic instead of giving up.  Then the split to introduce the 
> subreg expansion happens after reload when we already have the right 
> register class.  This seems to work well enough on 4.6; for this 
> particular example, I'm getting:
>
> .L3:
>     ldc1    $f2,0($2)
>     addiu    $2,$2,8
>     bne    $2,$4,.L3
>     sub.d    $f0,$f0,$f2
>
>     mfc1    $2,$f1
>     ext    $2,$2,0,31
>     j    $31
>     mtc1    $2,$f1
>
> However, same patch on mainline is still giving spills to memory.  :-(
>
> So, here's my question.  Is it worthwhile for me to continue this 
> approach of trying to make the MIPS backend smarter?  Or is the way 
> IRA deals with CANNOT_CHANGE_MODE_CLASS fundamentally broken and in 
> need of fixing in a target-inspecific way?  And/or is there some other 
> regression in IRA on mainline that's causing it to spill to memory 
> when it didn't used to in 4.6?
>
I think the second ("fixing in a target-inspecific way").  Instead of 
prohibiting class for a pseudo (that what is happening for class 
FP_REGS) because the pseudo can change its mode, impossibility of 
changing mode should be reflected in the class cost (through some reload 
cost evaluation).

I'll try to fix it.  The only problem is that it will take sometime 
because the fix should be tested on a few platforms.  It would be nice 
to make PR not to forget about the problem.
> BTW, the unary "neg" operator has the same problem as "abs" on MIPS; 
> can't use the hardware instruction because it does the wrong thing 
> with NaNs, and can't twiddle the sign bit directly in a FP register.  
> With both abs/neg now generating unnecessary memory spills, this seems 
> like a fairly important performance regression....

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
  2011-07-29 18:57 ` Vladimir Makarov
@ 2011-08-01 20:03   ` Sandra Loosemore
  0 siblings, 0 replies; 6+ messages in thread
From: Sandra Loosemore @ 2011-08-01 20:03 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc

On 07/29/2011 12:13 PM, Vladimir Makarov wrote:
> On 07/27/2011 05:59 PM, Sandra Loosemore wrote:
>>
>> [snip]
>>
>> So, here's my question. Is it worthwhile for me to continue this
>> approach of trying to make the MIPS backend smarter? Or is the way IRA
>> deals with CANNOT_CHANGE_MODE_CLASS fundamentally broken and in need
>> of fixing in a target-inspecific way? And/or is there some other
>> regression in IRA on mainline that's causing it to spill to memory
>> when it didn't used to in 4.6?
>>
> I think the second ("fixing in a target-inspecific way"). Instead of
> prohibiting class for a pseudo (that what is happening for class
> FP_REGS) because the pseudo can change its mode, impossibility of
> changing mode should be reflected in the class cost (through some reload
> cost evaluation).
>
> I'll try to fix it. The only problem is that it will take sometime
> because the fix should be tested on a few platforms. It would be nice to
> make PR not to forget about the problem.

Thanks for offering to look into this.  I created

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936

with my test case and WIP patch for the MIPS backend.  At this point I'm 
thinking that the additional memory spills I'm seeing on mainline are 
not related to CANNOT_CHANGE_MODE_CLASS at all but are just some other 
regression in the register allocator compared to 4.6.  It might be 
useful to try to confirm/isolate that problem first.

-Sandra

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-08-01 20:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-28  8:11 IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions? Sandra Loosemore
2011-07-28 14:24 ` Richard Guenther
2011-07-28 14:47   ` Sandra Loosemore
2011-07-28 22:47     ` Jeff Law
2011-07-29 18:57 ` Vladimir Makarov
2011-08-01 20:03   ` Sandra Loosemore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).