Re: Optimization problem on PPC when casting integers to floats

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Optimization problem on PPC when casting integers to floats
@ 2000-05-30 18:58 Timothy J. Wood
  0 siblings, 0 replies; 7+ messages in thread
From: Timothy J. Wood @ 2000-05-30 18:58 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc, David Edelsohn, shebs

From: Geoff Keating <geoffk@cygnus.com>
>I think it might help if it instead generated
>  [(set (match_dup 7) (match_dup 2))
>   (parallel [(set (match_operand:DF 0 "gpc_reg_operand" "")
>		   (float:DF (match_operand:SI 1 "gpc_reg_operand" "")))
>	      (use (match_dup 3))
>	      (use (match_dup 4))
>	      (clobber (match_dup 8))
>	      (clobber (match_dup 5))
>	      (clobber (match_dup 6))])]



  OK, so in the Darwin repository, cc/cc/config/rs6000/rs6000.md has the  
following:

;; Conversions to and from floating-point.
(define_expand "floatsidf2"
  [(parallel [(set (match_operand:DF 0 "gpc_reg_operand" "")
		   (float:DF (match_operand:SI 1 "gpc_reg_operand" "")))
	      (use (match_dup 2))
	      (use (match_dup 3))
	      (clobber (match_dup 4))
	      (clobber (match_dup 5))
	      (clobber (reg:DF 76))])]
  "! TARGET_POWERPC64 && TARGET_HARD_FLOAT"
  "
{
  operands[2] = force_reg (SImode, GEN_INT (0x43300000));
  operands[3] = force_reg (DFmode, rs6000_float_const  
(\"4503601774854144\", DFmode));
  operands[4] = gen_reg_rtx (SImode);
  operands[5] = gen_reg_rtx (Pmode);
}")


  On the head of the GCC repository, we have:

(define_expand "floatsidf2"
  [(parallel [(set (match_operand:DF 0 "gpc_reg_operand" "")
		   (float:DF (match_operand:SI 1 "gpc_reg_operand" "")))
	      (use (match_dup 2))
	      (use (match_dup 3))
	      (clobber (match_dup 4))
	      (clobber (match_dup 5))
	      (clobber (match_dup 6))])]
  "! TARGET_POWERPC64 && TARGET_HARD_FLOAT"
  "
{
  operands[2] = force_reg (SImode, GEN_INT (0x43300000));
  operands[3] = force_reg (DFmode, rs6000_float_const  
(\"4503601774854144\", DFmode));
  operands[4] = assign_stack_temp (DFmode, GET_MODE_SIZE (DFmode), 0);
  operands[5] = gen_reg_rtx (DFmode);
  operands[6] = gen_reg_rtx (SImode);
}")


  I'm not a .md file hacker (I'm not much of a gcc hacker at all), so if  
anyone could suggest a replacement for the section of the .md file that I  
can copy into the _Darwin_ repository version, I'd be happy to test to see  
if this approach works.  Of course, I'm stuck using the Darwin gcc right  
now (2.92.2 based) so if this fix depends upon some newer alias stuff or  
something, it might not work for me even if the fix is valid.

-tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimization problem on PPC when casting integers to floats
  2000-05-20 19:19 Timothy J. Wood
  2000-05-22 11:29 ` Geoff Keating
@ 2000-08-25 17:02 ` Geoff Keating
  1 sibling, 0 replies; 7+ messages in thread
From: Geoff Keating @ 2000-08-25 17:02 UTC (permalink / raw)
  To: tjw; +Cc: gcc

"Timothy J. Wood" <bungi@omnigroup.com> writes:

>   More important is the second problem.  When building the double on the =
> stack, there are two words, one that is constant throughout the loop and =
> one that needs to be updated for each value.  But, gcc doesn't realize =
> that it can avoid rewriting the constant part of the double on each =
> iteration of the loop and in most of the cases above, ends up issuing =
> twice as many store instructions as necessary.

[Working through my old mail...]

GCC doesn't have any code to perform the optimisation of
hoisting constant stores out of loops.  It can sink them, but in this
case that would make the code not work.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimization problem on PPC when casting integers to floats
@ 2000-05-30 19:51 Mike Stump
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Stump @ 2000-05-30 19:51 UTC (permalink / raw)
  To: geoffk, tjw; +Cc: dje, gcc, shebs

> To: Geoff Keating <geoffk@cygnus.com>
> Date: Tue, 30 May 2000 18:58:04 -0700
> From: "Timothy J. Wood" <bungi@omnigroup.com>

> I'm not a .md file hacker (I'm not much of a gcc hacker at all), so
> if anyone could suggest a replacement for the section of the .md
> file that I can copy into the _Darwin_ repository version, I'd be
> happy to test to see if this approach works.

It is generally beyond the scope of our work to work on other peoples
versions of gcc.  You should instead, ask them on their mailing lists
to do this.  We assume that you can update to the latest gcc, and if
they bug occurs there, you can report it, and we should fix it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimization problem on PPC when casting integers to floats
@ 2000-05-22 12:43 Timothy J. Wood
  0 siblings, 0 replies; 7+ messages in thread
From: Timothy J. Wood @ 2000-05-22 12:43 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc, David Edelsohn

From: Geoff Keating <geoffk@cygnus.com>
>>   But in the case of casting unsigned shorts to floats, gcc goes bonkers =
>> and ends up doing part of the conversion twice and throwing away the =
>> results.
>Could you be more specific?  I think this is fixed in the lastest
>sources, but I'm not sure.


From this test function:

double test1(unsigned short *stuff)
{
unsigned int i;
double x = 0;
for (i = 0; i < 1000; i++)
x += stuff[i];
return x;
}

I was getting the following code (under MacOS X DP4):

_test1:
lis r7,ha16(LC0)
la r7,lo16(LC0)(r7)
lfd f1,0(r7)
lis r11,0x4330
lis r9,0x4330
lis r10,0x8000
li r8,1000
mtctr r8
L8:
lhz r0,0(r3)
addi r3,r3,2
stw r9,-16(r1)
stw r10,-12(r1)
lfd f13,-16(r1)
xoris r8,r0,0x8000
stw r8,-4(r1)
stw r11,-8(r1)
lfd f0,-8(r1)
fsub f0,f0,f13
fadd f1,f1,f0
bdnz L8
blr


Notice in the loop that there are apparently two copies of the double on the stack (offsets -4, -8, -12 and -16 are all written), but only one double is read (at offset -8).

From the test results that I forwarded from Steven G. Johnson for Linux PPC with 2.95.2 and a recent snapshot, it looks like this case is now only as bad as the others (it only has one extra write instead of three).  The assembly that Steven mailed to me was as follows:

gcc-2.95.2 produces the following:

test1:
stwu 1,-16(1)
lis 9,.LC0@ha
la 9,.LC0@l(9)
lfd 1,0(9)
lis 9,.LC1@ha
li 0,1000
la 9,.LC1@l(9)
mtctr 0
lfd 13,0(9)
lis 11,0x4330
..L8:
lhz 0,0(3)
xoris 0,0,0x8000
stw 0,12(1)
stw 11,8(1)
lfd 0,8(1)
addi 3,3,2
fsub 0,0,13
fadd 1,1,0
bdnz .L8
la 1,16(1)
blr


The CVS gcc produces this:

test1:
li 0,1000
lis 11,.LC1@ha
mtctr 0
lis 9,.LC0@ha
la 11,.LC1@l(11)
la 9,.LC0@l(9)
lfd 13,0(11)
lfd 1,0(9)
lis 9,0x4330
stwu 1,-16(1)
..L9:
lhz 0,0(3)
addi 3,3,2
stw 9,8(1)
xoris 0,0,0x8000
stw 0,12(1)
lfd 0,8(1)
fsub 0,0,13
fadd 1,1,0
bdnz .L9
addi 1,1,16
blr



-tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimization problem on PPC when casting integers to floats
  2000-05-20 19:19 Timothy J. Wood
@ 2000-05-22 11:29 ` Geoff Keating
  2000-08-25 17:02 ` Geoff Keating
  1 sibling, 0 replies; 7+ messages in thread
From: Geoff Keating @ 2000-05-22 11:29 UTC (permalink / raw)
  To: tjw; +Cc: gcc, David Edelsohn

"Timothy J. Wood" <bungi@omnigroup.com> writes:

>   More important is the second problem.  When building the double on the =
> stack, there are two words, one that is constant throughout the loop and =
> one that needs to be updated for each value.  But, gcc doesn't realize =
> that it can avoid rewriting the constant part of the double on each =
> iteration of the loop and in most of the cases above, ends up issuing =
> twice as many store instructions as necessary.

OK, I think I can see how to fix this.

At present, floatsidf2 generates this:

  [(parallel [(set (match_operand:DF 0 "gpc_reg_operand" "")
		   (float:DF (match_operand:SI 1 "gpc_reg_operand" "")))
	      (use (match_dup 2))
	      (use (match_dup 3))
	      (clobber (match_dup 4))
	      (clobber (match_dup 5))
	      (clobber (match_dup 6))])]

I think it might help if it instead generated
  [(set (match_dup 7) (match_dup 2))
   (parallel [(set (match_operand:DF 0 "gpc_reg_operand" "")
		   (float:DF (match_operand:SI 1 "gpc_reg_operand" "")))
	      (use (match_dup 3))
	      (use (match_dup 4))
	      (clobber (match_dup 8))
	      (clobber (match_dup 5))
	      (clobber (match_dup 6))])]

where 'match_dup 7' and 'match_dup 8' are SImode versions of the
stack temporary in operands[4]; operands[7] is the part that stays
constant, and operands[8] is the part that gets changed.

Then the backend has a chance to hoist the SET, assuming its alias
analysis is good enough (which I think it is).

I can't think of a reason to do this for any of the other
instructions... Perhaps for the XOR.  I think we don't want to expose
the second memory store that early.

>   But in the case of casting unsigned shorts to floats, gcc goes bonkers =
> and ends up doing part of the conversion twice and throwing away the =
> results.

Could you be more specific?  I think this is fixed in the lastest
sources, but I'm not sure.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimization problem on PPC when casting integers to floats
  2000-05-21 12:32 Fwd: " Timothy J. Wood
@ 2000-05-21 14:49 ` David Edelsohn
  0 siblings, 0 replies; 7+ messages in thread
From: David Edelsohn @ 2000-05-21 14:49 UTC (permalink / raw)
  To: tjw; +Cc: gcc

	This is a fundamental artifact of the way that the int to FP
conversion is designed in the PowerPC port of GCC.  Current the backend
believes that the stack slot is modified in each iteration and that the
value cannot be preserved.

	This is not correct and should be fixed, but it will take time and
study to understand how to restructure the GCC machine description to
avoid this problem.  The machine description needs to assert that the
unchanging part of the stack word truly is this constant value which can
be hoisted out of the loop.

	Thanks for pointing it out.  Don't expect an overnight patch to
solve it.  Hopefully it can be address in the forthcoming gcc-3.0 release.

Thanks, David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Optimization problem on PPC when casting integers to floats
@ 2000-05-20 19:19 Timothy J. Wood
  2000-05-22 11:29 ` Geoff Keating
  2000-08-25 17:02 ` Geoff Keating
  0 siblings, 2 replies; 7+ messages in thread
From: Timothy J. Wood @ 2000-05-20 19:19 UTC (permalink / raw)
  To: gcc

[Sorry about the repeat -- I forgot to fix the subject line on the previous copy.  I'm sending this again with a good subject line to the appropriate people will read it].

  I'm working on the Quake3:Arena port for MacOS X DP4.  This newly released version supposedly contains a compiler based off of gcc 2.95.2 (at least cc -v claims something to that effect).

  The PPC is really poor at converting ints to floats, there is an optimization problem that is making this even worse.  One of the current hot spots in Quake3:Arena exposes this problem under MacOS X DP4.  I don't have a Linux PPC system to test this on unfortunately (maybe I'll reformat my PowerBook to have more partitions in the future :).

  Some code that demonstrates the problem follows.  I'm using doubles in this code to make the example assembly simpler (avoids rounding the double to a float).

---- cast.c ----

double test1(unsigned short *stuff)
{
    unsigned int i;
    double x = 0;
    for (i = 0; i < 1000; i++)
        x += stuff[i];
    return x;
}

double test2(short *stuff)
{
    unsigned int i;
    double x = 0;
    for (i = 0; i < 1000; i++)
        x += stuff[i];
    return x;
}

double test3(char *stuff)
{
    unsigned int i;
    double x = 0;
    for (i = 0; i < 1000; i++)
        x += stuff[i];
    return x;
}

double test4(unsigned char *stuff)
{
    unsigned int i;
    double x = 0;
    for (i = 0; i < 1000; i++)
        x += stuff[i];
    return x;
}

typedef struct _foo {
    unsigned int i:18;
    unsigned int j:14;
} foo;

double test5(foo *stuff)
{
    unsigned int i;
    double x = 0;
    for (i = 0; i < 1000; i++)
        x += stuff[i].i;
    return x;
}

---- end cast.c ----

  I compile this test case with:

	cc -static -O2 -S cast.c

  (the -static avoids a bunch of Mach-O PIC crud that isn't relevant to the problem), I get something like the following:

.data
.const
        .align 2
LC0:
        .double 0r0.00000000000000000000e0
.text
        .align 2
.globl _test1
_test1:
        lis r7,ha16(LC0)
        la r7,lo16(LC0)(r7)
        lfd f1,0(r7)
        lis r11,0x4330
        lis r9,0x4330
        lis r10,0x8000
        li r8,1000
        mtctr r8
L8:
        lhz r0,0(r3)
        addi r3,r3,2
        stw r9,-16(r1)
        stw r10,-12(r1)
        lfd f13,-16(r1)
        xoris r8,r0,0x8000
        stw r8,-4(r1)
        stw r11,-8(r1)
        lfd f0,-8(r1)
        fsub f0,f0,f13
        fadd f1,f1,f0
        bdnz L8
        blr

  There are two problems here.  First, is that the various constants are duplicated for each function (there are more constants for some of the types).  Maybe the linker cleans this up -- I don't know.

  More important is the second problem.  When building the double on the stack, there are two words, one that is constant throughout the loop and one that needs to be updated for each value.  But, gcc doesn't realize that it can avoid rewriting the constant part of the double on each iteration of the loop and in most of the cases above, ends up issuing twice as many store instructions as necessary.

  But in the case of casting unsigned shorts to floats, gcc goes bonkers and ends up doing part of the conversion twice and throwing away the results.

  Can anyone replicate this result on a PPC Linux box with the latest sources?  Either way, hopefully Apple will be able to fix these problems so that their Quake port for MacOS X will run faster :)

-tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-08-25 17:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-30 18:58 Optimization problem on PPC when casting integers to floats Timothy J. Wood
  -- strict thread matches above, loose matches on Subject: below --
2000-05-30 19:51 Mike Stump
2000-05-22 12:43 Timothy J. Wood
2000-05-21 12:32 Fwd: " Timothy J. Wood
2000-05-21 14:49 ` David Edelsohn
2000-05-20 19:19 Timothy J. Wood
2000-05-22 11:29 ` Geoff Keating
2000-08-25 17:02 ` Geoff Keating

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).