Re: rs6000.md/altivec.md problem in setting of vector registers

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: rs6000.md/altivec.md problem in setting of vector registers
       [not found] <OF6E029669.0E5E3F05-ONC2256E5C.000482C4-C2256E5C.0004D314@il.ibm.com>
@ 2004-03-19  8:45 ` David Edelsohn
  2004-03-21  0:36   ` Dorit Naishlos
  2004-03-19 20:59 ` Dale Johannesen
  1 sibling, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2004-03-19  8:45 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: gcc

	Did you examine the effect of adding vec_set and vec_extract
patterns?

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
       [not found] <OF6E029669.0E5E3F05-ONC2256E5C.000482C4-C2256E5C.0004D314@il.ibm.com>
  2004-03-19  8:45 ` rs6000.md/altivec.md problem in setting of vector registers David Edelsohn
@ 2004-03-19 20:59 ` Dale Johannesen
  2004-03-21  1:47   ` Dorit Naishlos
  1 sibling, 1 reply; 17+ messages in thread
From: Dale Johannesen @ 2004-03-19 20:59 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: gcc, David Edelsohn, Dale Johannesen

On Mar 18, 2004, at 4:52 PM, Dorit Naishlos wrote:
> I managed to hack something that causes Reload to make the "right" 
> decision
> and generate the same code as it generates for i386 - i.e., the spill 
> code
> is created out of the loop.
>
> The hack modifies the macro CANNOT_CHANGE_MODE_CLASS in rs6000.h to not
> allow a mode change from a vector mode to a smaller mode in 
> ALTIVEC_REGS or
> GENERAL_REGS. As a result, these register classes cannot be considered 
> for
> allocation in this case; instead, Reload directly creates stores to 
> memory,
> outside the loop, like for i386.

I haven't tried it, but this might well break passing of vector 
parameters in int regs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-19  8:45 ` rs6000.md/altivec.md problem in setting of vector registers David Edelsohn
@ 2004-03-21  0:36   ` Dorit Naishlos
  2004-03-23 22:10     ` David Edelsohn
  0 siblings, 1 reply; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-21  0:36 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc


No. I focused on understanding what in the machine description explains the
different ways Reload handles the same pattern ('set subreg') on the two
platforms (i386/powerpc).

dorit



                                                                                                                                       
                      David Edelsohn                                                                                                   
                      <dje@makai.watson        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                                
                      .ibm.com>                cc:       gcc@gcc.gnu.org                                                               
                                               Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers               
                      19/03/2004 08:09                                                                                                 
                                                                                                                                       




             Did you examine the effect of adding vec_set and vec_extract
patterns?

David



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-19 20:59 ` Dale Johannesen
@ 2004-03-21  1:47   ` Dorit Naishlos
  0 siblings, 0 replies; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-21  1:47 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: gcc, David Edelsohn


> I haven't tried it, but this might well break passing of vector
> parameters in int regs.

I would be surprised if this hack did not create all kinds of code
generation problems. Maybe it could be reworked or perhaps something else
could be tweaked to achieve the same effect?

dorit




                                                                                                                                       
                      Dale Johannesen                                                                                                  
                      <dalej@apple.com>        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                                
                                               cc:       gcc@gcc.gnu.org, David Edelsohn <dje@makai.watson.ibm.com>, Dale Johannesen   
                      19/03/2004 20:55          <dalej@apple.com>                                                                      
                                               Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers               
                                                                                                                                       




On Mar 18, 2004, at 4:52 PM, Dorit Naishlos wrote:
> I managed to hack something that causes Reload to make the "right"
> decision
> and generate the same code as it generates for i386 - i.e., the spill
> code
> is created out of the loop.
>
> The hack modifies the macro CANNOT_CHANGE_MODE_CLASS in rs6000.h to not
> allow a mode change from a vector mode to a smaller mode in
> ALTIVEC_REGS or
> GENERAL_REGS. As a result, these register classes cannot be considered
> for
> allocation in this case; instead, Reload directly creates stores to
> memory,
> outside the loop, like for i386.

I haven't tried it, but this might well break passing of vector
parameters in int regs.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-23 22:10     ` David Edelsohn
@ 2004-03-23 17:03       ` Dorit Naishlos
  0 siblings, 0 replies; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-23 17:03 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc


I won't be able to dedicate much time at the moment to this rs6000 code
generation problem. I've included a test case that displays it, in case any
one would like to look into it:

typedef int __attribute__((mode(V4SI))) v4si;
typedef int aint __attribute__ ((__aligned__(16)));
#define N 1024
typedef union {
   aint a[N];
   v4si pa[N/4];
} vec_union;

void
foo (short n){
  vec_union a;
  v4si va = {n,n,n,n};
  int i;

  for (i=0; i<N/4; i++){
    a.pa[i] = va;
  }
}

Below is the code that is being generated on powerpc and i386.

dorit


This is the code generated on powerpc, compiling with -O3 -floop-optimize2
-maltivec:
(relevant code marked with ">>";
4 scalar stores + 1 vector load, all invariant, all in the loop)

foo:
        mfspr r5,256
        oris r12,r5,0x8000
        stw r5,-8(r1)
        mtspr 256,r12
        li r0,256
        mflr r4
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        stw r31,-4(r1)
        mr r9,r3
        mflr r31
        mr r10,r3
        stw r4,8(r1)
        mr r11,r3
        mr r12,r3
        mtctr r0
        addis r2,r31,ha16(L_a$non_lazy_ptr-"L00000000001$pb")
        lwz r8,lo16(L_a$non_lazy_ptr-"L00000000001$pb")(r2)
        li r2,0
L4:
        addi r7,r1,-32
        slwi r3,r2,4
>>      stw r9,0(r7)
        addi r2,r2,1
>>      stw r10,4(r7)
>>      stw r11,8(r7)
>>      stw r12,12(r7)
>>      lvx v0,0,r7
        stvx v0,r3,r8
        bdnz L4

        lwz r8,-8(r1)
        mtspr 256,r8
        lwz r6,8(r1)
        lwz r31,-4(r1)
        mtlr r6
        blr


This is the code generated on i386, compiling with -O3 -msse2:
(relevant code marked with ">>";
4 scalar stores out of the loop. 1 invariant vector load, in the loop)

foo:
        pushl   %ebp
        xorl    %edx, %edx
        movl    %esp, %ebp
        subl    $24, %esp
        movswl  8(%ebp),%eax
>>      movl    %eax, -24(%ebp)
>>      movl    %eax, -20(%ebp)
>>      movl    %eax, -16(%ebp)
>>      movl    %eax, -12(%ebp)
        movdqa  -24(%ebp), %xmm0
        .p2align 4,,15
.L5:
>>      movl    %edx, %ecx
        incl    %edx
        sall    $4, %ecx
        movdqa  %xmm0, a(%ecx)
        cmpl    $255, %edx
        jle     .L5

        leave
        ret




                                                                                                                                   
                      David Edelsohn                                                                                               
                      <dje@makai.watson        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                            
                      .ibm.com>                cc:       gcc@gcc.gnu.org                                                           
                                               Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers           
                      21/03/2004 00:18                                                                                             
                                                                                                                                   




>>>>> Dorit Naishlos writes:

Dorit> I focused on understanding what in the machine description explains
the
Dorit> different ways Reload handles the same pattern ('set subreg') on the
two
Dorit> platforms (i386/powerpc).

             Altivec and SSE are integrated in their respective
architectures
in different ways, so GCC of one is not alway appropriate for the other.
The vec_set and vec_extract patterns provide explicit control over setting
vector elements, so that probably is the best way to achieve the optimal
behavior.

David



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-21  0:36   ` Dorit Naishlos
@ 2004-03-23 22:10     ` David Edelsohn
  2004-03-23 17:03       ` Dorit Naishlos
  0 siblings, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2004-03-23 22:10 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: gcc

>>>>> Dorit Naishlos writes:

Dorit> I focused on understanding what in the machine description explains the
Dorit> different ways Reload handles the same pattern ('set subreg') on the two
Dorit> platforms (i386/powerpc).

	Altivec and SSE are integrated in their respective architectures
in different ways, so GCC of one is not alway appropriate for the other.
The vec_set and vec_extract patterns provide explicit control over setting
vector elements, so that probably is the best way to achieve the optimal
behavior.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-11 23:31       ` Richard Henderson
@ 2004-03-12  3:14         ` David Edelsohn
  0 siblings, 0 replies; 17+ messages in thread
From: David Edelsohn @ 2004-03-12  3:14 UTC (permalink / raw)
  To: Richard Henderson, Dorit Naishlos, Dale Johannesen, gcc

>>>>> Richard Henderson writes:

Richard> See the vec_set<mode> and vec_extract<mode> patterns that Jan added.

	Okay, thanks.  It would be nice if these new patterns were
documented. 

Thanks, David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-11 22:38     ` David Edelsohn
@ 2004-03-11 23:31       ` Richard Henderson
  2004-03-12  3:14         ` David Edelsohn
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2004-03-11 23:31 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Dorit Naishlos, Dale Johannesen, gcc

On Thu, Mar 11, 2004 at 05:38:24PM -0500, David Edelsohn wrote:
> GCC commits to an poor instruction selection choice and it's downhill from
> there.  It probably would be better to fix vector initialization, or at
> least improve the current hack, instead of trying to change register
> allocator behavior.

See the vec_set<mode> and vec_extract<mode> patterns that Jan added.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-05  0:06   ` Dorit Naishlos
  2004-03-05  0:23     ` Dale Johannesen
@ 2004-03-11 22:38     ` David Edelsohn
  2004-03-11 23:31       ` Richard Henderson
  1 sibling, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2004-03-11 22:38 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: Dale Johannesen, gcc

>>>>> Dorit Naishlos writes:

Dorit> Thanks, this indeed looks like the right direction, and it seems to support
Dorit> the original suspicion that there is a problem with the "insvsi" pattern of
Dorit> the insns that are used to initialize the va pseudo:

	I think the problems with register allocation are a byproduct of
the earlier decision by GCC to use a bitfield to initialize the value.

	Because the initializer is an argument to the function, GCC
chooses the CONSTRUCTOR path to materialize it.  The comment in
expand_expr_real() expresses it as well as anything:

        FIXME: Avoid trying to fill vector constructors piece-meal.
        Output them with output_constant_def below unless we're sure
        they're zeros.  This should go away when vector initializers
        are treated like VECTOR_CST instead of arrays.

GCC commits to an poor instruction selection choice and it's downhill from
there.  It probably would be better to fix vector initialization, or at
least improve the current hack, instead of trying to change register
allocator behavior.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-05  0:23     ` Dale Johannesen
@ 2004-03-09 18:46       ` David Edelsohn
  0 siblings, 0 replies; 17+ messages in thread
From: David Edelsohn @ 2004-03-09 18:46 UTC (permalink / raw)
  To: Dorit Naishlos, Dale Johannesen; +Cc: gcc

> There are several cost-computation functions in rs6000.c, especially
> rs6000_register_move_cost looks promising.  Doesn't the RA call into there
> at some point?  Seems like it should.  That would be the right place, if
> it does.

	REGISTER_MOVE_COST only is provided the mode and register classes,
not the actual instruction, so it has no knowledge about the operands.
The costs is not wrong, but the RA is inquiring about the wrong mode.
Either RA needs to ask about the correct mode or insv needs to generate
patterns for vector types.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-03 18:44 ` Dale Johannesen
  2004-03-05  0:06   ` Dorit Naishlos
@ 2004-03-07 18:30   ` Aldy Hernandez
  1 sibling, 0 replies; 17+ messages in thread
From: Aldy Hernandez @ 2004-03-07 18:30 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: Dorit Naishlos, gcc

>>>>> "Dale" == Dale Johannesen <dalej@apple.com> writes:

 > This sort of thing generally violates strict aliasing rules, and has a
 > tendency
 > not to work as expected when scheduled.   Vectors are outside the
 > standard,
 > but I don't think there's any special handling for aliasing them, is
 > there Aldy?

No.

Aldy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-05  0:06   ` Dorit Naishlos
@ 2004-03-05  0:23     ` Dale Johannesen
  2004-03-09 18:46       ` David Edelsohn
  2004-03-11 22:38     ` David Edelsohn
  1 sibling, 1 reply; 17+ messages in thread
From: Dale Johannesen @ 2004-03-05  0:23 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: gcc

On Mar 4, 2004, at 4:09 PM, Dorit Naishlos wrote:
>> Those loads and stores in the loop are happening because RA
>> chose to assign int registers to the va pseudo.  This is obviously
>> not a good idea and suggests some problem in computing the register
>> costs.
>
> Thanks, this indeed looks like the right direction, and it seems to 
> support
> the original suspicion that there is a problem with the "insvsi" 
> pattern of
> the insns that are used to initialize the va pseudo:
>
> (insn:HI 12 11 13 0 (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ 
> va
> ]) 0)
>             (const_int 16 [0x10])
>             (const_int 0 [0x0]))
>         (reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 11 (insn_list 3
> (nil)))
>     (nil))
>
> When the RA calculates the cost of assigning an ALTIVEC_REG to 
> temporary
> 120 in any of these insns, it checks the related cost using the 
> relevant
> machine mode of the operand in question. The relevant mode should be 
> V8HI.
> However, the "zero_extract" part of the "insvsi" pattern "hides" that, 
> and
> instead, the RA calculates the cost of assigning an ALTIVEC_REG to an
> *SImode* operand. The default for such "impossible" assignments is 
> 10,000.

I don't know if it would be more appropriate to address this in the RA 
code or
the patterns.  Somebody who understands RA better can probably tell you.

There are several cost-computation functions in rs6000.c, especially
rs6000_register_move_cost looks promising.  Doesn't the RA call into
there at some point?  Seems like it should.  That would be the right 
place,
if it does.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-03 18:44 ` Dale Johannesen
@ 2004-03-05  0:06   ` Dorit Naishlos
  2004-03-05  0:23     ` Dale Johannesen
  2004-03-11 22:38     ` David Edelsohn
  2004-03-07 18:30   ` Aldy Hernandez
  1 sibling, 2 replies; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-05  0:06 UTC (permalink / raw)
  To: Dale Johannesen, gcc


> Those loads and stores in the loop are happening because RA
> chose to assign int registers to the va pseudo.  This is obviously
> not a good idea and suggests some problem in computing the register
> costs.

Thanks, this indeed looks like the right direction, and it seems to support
the original suspicion that there is a problem with the "insvsi" pattern of
the insns that are used to initialize the va pseudo:

(insn:HI 12 11 13 0 (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ va
]) 0)
            (const_int 16 [0x10])
            (const_int 0 [0x0]))
        (reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 11 (insn_list 3
(nil)))
    (nil))

(insn:HI 13 12 14 0 (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ va
]) 0)
            (const_int 16 [0x10])
            (const_int 16 [0x10]))
        (reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 12 (insn_list 3
(nil)))
    (nil))

(insn:HI 14 13 15 0 (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ va
]) 4)
            (const_int 16 [0x10])
            (const_int 0 [0x0]))
        (reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 13 (insn_list 3
(nil)))
    (nil))

.....

When the RA calculates the cost of assigning an ALTIVEC_REG to temporary
120 in any of these insns, it checks the related cost using the relevant
machine mode of the operand in question. The relevant mode should be V8HI.
However, the "zero_extract" part of the "insvsi" pattern "hides" that, and
instead, the RA calculates the cost of assigning an ALTIVEC_REG to an
*SImode* operand. The default for such "impossible" assignments is 10,000.

Is there a way to express a vector pseudo initialization without the
zero_extract, or in a way that exposes the relevant mode (V8HI)? should we
change the insvsi pattern, or define a new pattern to be generated in this
case?

A little more details below.

thanks,
dorit

The reg allocator calculates the cost of assigning a GENERAL_REG to the
vector temporary 120 as follows: each of the 8 "insvsi" insns contributes 0
to the overall cost; the stvx insn in the loop however contributes a cost
of 20:

OVERALL_COST(GENERAL_REG) = 8 * (0 * freq_out_of_loop) + (20 *
freq_in_loop)
OVERALL_COST(GENERAL_REG) = 8 * (0 * 7) + (20 * 1000) = 20,000

The cost '20' is obtained by checking the cost of moving data of a V8HImode
operand from a GENERAL_REG into an ALTIVEC_REG. The overall cost of
assigning an ALTIVEC_REG should therefore be as follows:

OVERALL_COST(ALTIVEC_REG) = 8 * (20 * freq_out_of_loop) + (0 *
freq_in_loop)
OVERALL_COST(ALTIVEC_REG) = 8 * (20 * 7) + (0 * 1000) = 1,120

However, because the operand in the "insvsi" pattern has an SImode, the
cost that is used instead is of moving SImode data from a GENERAL_REG into
an ALTIVEC_REG which is set to the maximum cost (defaults to 10,000). The
outcome is:

OVERALL_COST(ALTIVEC_REG) = 8 * (10,000 * freq_out_of_loop) + (0 *
freq_in_loop)
OVERALL_COST(ALTIVEC_REG) = 8 * (10,000 * 7) + (0 * 1000) = 56,000



                                                                                                                                       
                      Dale Johannesen                                                                                                  
                      <dalej@apple.com>        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                                
                                               cc:       gcc@gcc.gnu.org                                                               
                      03/03/2004 20:42         Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers               
                                                                                                                                       





On Mar 3, 2004, at 8:56 AM, Dorit Naishlos wrote:

> Hi,
>
> I think there is a problem in the way we model the setting of subregs
> (insn
> "insvsi") in rs6000, or rather - a problem in the way reload phase
> handles
> these patterns when they are generated to express an initialization of
> a
> vector register. Consider the following example:
>
> typedef int __attribute__((mode(V8HI))) v8hi;
> #define N 1024
> void foo5 (short n){
>     short a[N];
>     v8hi *pa = (v8hi *)a;
>>>  v8hi va = {n,n,n,n,n,n,n,n};
>     int i;
>
>     for (i=0; i<N/8; i++){
>       pa[i] = va;
>     }
>     bar1 (pa[2]);
> }

Those loads and stores in the loop are happening because RA
chose to assign int registers to the va pseudo.  This is obviously
not a good idea and suggests some problem in computing the register
costs.

  Register 120 costs: BASE_REGS:20000 GENERAL_REGS:20000
ALTIVEC_REGS:560000
SPEC_OR_GEN_REGS:2048 NON_FLOAT_REGS:20448 MEM:4896

That (from the .lreg dump) doesn't look right, I suggest investigating
why the
Altivec number is so high.  Comparison with x86 may be helpful.

A couple of observations that won't help with your main problem:
>     short a[N];
>     v8hi *pa = (v8hi *)a;
This sort of thing generally violates strict aliasing rules, and has a
tendency
not to work as expected when scheduled.   Vectors are outside the
standard,
but I don't think there's any special handling for aliasing them, is
there Aldy?
I don't see why pa will necessarily by 16-byte aligned, either,
although it always
was when I tried.

The shift-i-and-add inside the loop isn't strength reduced, and should
be.
It is if you change pretty much anything, though, so it's hitting some
edge case.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-03 16:46 Dorit Naishlos
  2004-03-03 17:52 ` David Edelsohn
@ 2004-03-03 18:44 ` Dale Johannesen
  2004-03-05  0:06   ` Dorit Naishlos
  2004-03-07 18:30   ` Aldy Hernandez
  1 sibling, 2 replies; 17+ messages in thread
From: Dale Johannesen @ 2004-03-03 18:44 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: gcc


On Mar 3, 2004, at 8:56 AM, Dorit Naishlos wrote:

> Hi,
>
> I think there is a problem in the way we model the setting of subregs 
> (insn
> "insvsi") in rs6000, or rather - a problem in the way reload phase 
> handles
> these patterns when they are generated to express an initialization of 
> a
> vector register. Consider the following example:
>
> typedef int __attribute__((mode(V8HI))) v8hi;
> #define N 1024
> void foo5 (short n){
>     short a[N];
>     v8hi *pa = (v8hi *)a;
>>>  v8hi va = {n,n,n,n,n,n,n,n};
>     int i;
>
>     for (i=0; i<N/8; i++){
>       pa[i] = va;
>     }
>     bar1 (pa[2]);
> }

Those loads and stores in the loop are happening because RA
chose to assign int registers to the va pseudo.  This is obviously
not a good idea and suggests some problem in computing the register 
costs.

  Register 120 costs: BASE_REGS:20000 GENERAL_REGS:20000 
ALTIVEC_REGS:560000
SPEC_OR_GEN_REGS:2048 NON_FLOAT_REGS:20448 MEM:4896

That (from the .lreg dump) doesn't look right, I suggest investigating 
why the
Altivec number is so high.  Comparison with x86 may be helpful.

A couple of observations that won't help with your main problem:
>     short a[N];
>     v8hi *pa = (v8hi *)a;
This sort of thing generally violates strict aliasing rules, and has a 
tendency
not to work as expected when scheduled.   Vectors are outside the 
standard,
but I don't think there's any special handling for aliasing them, is 
there Aldy?
I don't see why pa will necessarily by 16-byte aligned, either, 
although it always
was when I tried.

The shift-i-and-add inside the loop isn't strength reduced, and should 
be.
It is if you change pretty much anything, though, so it's hitting some 
edge case.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-03 17:52 ` David Edelsohn
@ 2004-03-03 18:16   ` Dorit Naishlos
  0 siblings, 0 replies; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-03 18:16 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc


Thanks David. This indeed solves the ICE. However, we still need to address
the general problem of the invariant spill code inside the loop...

thanks,
dorit



                                                                                                                                 
                      David Edelsohn                                                                                             
                      <dje@makai.watson        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                          
                      .ibm.com>                cc:       gcc@gcc.gnu.org                                                         
                                               Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers         
                      03/03/2004 19:52                                                                                           
                                                                                                                                 




>>>>> Dorit Naishlos writes:

Dorit> Actually, if you try to compile the above program with -maltivec,
you'll
Dorit> get ICE'd during reload with the following error:

Dorit> simd-inv.c: In function `foo5':
Dorit> simd-inv.c:29: error: unrecognizable insn:
Dorit> (insn 88 87 89 0 (set (mem:V8HI (reg:SI 9 r9) [0 S16 A8])
Dorit> (reg:V8HI 2 r2 [126])) -1 (nil)
Dorit> (nil))
Dorit> simd-inv.c:29: internal compiler error: in extract_insn, at
recog.c:2083

Dorit> This is because of a restriction I added a month ago to the
following
Dorit> define_insn in altivec.md (last 2 lines):

Dorit> (define_insn "*movv8hi_internal1"
Dorit> [(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
Dorit> (match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
Dorit> "TARGET_ALTIVEC
>>> && (altivec_register_operand (operands[0], V8HImode)
>>> || altivec_register_operand (operands[1], V8HImode))"

Dorit> If I remove the above 2 lines, compilation succeeds; However... we
get the
Dorit> same inefficiencies that brought us to add these 2 lines in the
first place
Dorit> (loop invariants don't get pulled out -
Dorit> http://gcc.gnu.org/ml/gcc-patches/2004-01/msg02884.html);

             The patch applied to mainline was slightly different:

(define_insn "*movv8hi_internal1"
  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
        (match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
  "TARGET_ALTIVEC
   && (register_operand (operands[0], V8HImode)
       || register_operand (operands[1], V8HImode))"

exactly because of the type of failure you see.  You should correct the
patterns in LNO branch.

David



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: rs6000.md/altivec.md problem in setting of vector registers
  2004-03-03 16:46 Dorit Naishlos
@ 2004-03-03 17:52 ` David Edelsohn
  2004-03-03 18:16   ` Dorit Naishlos
  2004-03-03 18:44 ` Dale Johannesen
  1 sibling, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2004-03-03 17:52 UTC (permalink / raw)
  To: Dorit Naishlos; +Cc: gcc

>>>>> Dorit Naishlos writes:

Dorit> Actually, if you try to compile the above program with -maltivec, you'll
Dorit> get ICE'd during reload with the following error:

Dorit> simd-inv.c: In function `foo5':
Dorit> simd-inv.c:29: error: unrecognizable insn:
Dorit> (insn 88 87 89 0 (set (mem:V8HI (reg:SI 9 r9) [0 S16 A8])
Dorit> (reg:V8HI 2 r2 [126])) -1 (nil)
Dorit> (nil))
Dorit> simd-inv.c:29: internal compiler error: in extract_insn, at recog.c:2083

Dorit> This is because of a restriction I added a month ago to the following
Dorit> define_insn in altivec.md (last 2 lines):

Dorit> (define_insn "*movv8hi_internal1"
Dorit> [(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
Dorit> (match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
Dorit> "TARGET_ALTIVEC
>>> && (altivec_register_operand (operands[0], V8HImode)
>>> || altivec_register_operand (operands[1], V8HImode))"

Dorit> If I remove the above 2 lines, compilation succeeds; However... we get the
Dorit> same inefficiencies that brought us to add these 2 lines in the first place
Dorit> (loop invariants don't get pulled out -
Dorit> http://gcc.gnu.org/ml/gcc-patches/2004-01/msg02884.html);

	The patch applied to mainline was slightly different:

(define_insn "*movv8hi_internal1"
  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
        (match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
  "TARGET_ALTIVEC 
   && (register_operand (operands[0], V8HImode) 
       || register_operand (operands[1], V8HImode))"

exactly because of the type of failure you see.  You should correct the
patterns in LNO branch.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* rs6000.md/altivec.md problem in setting of vector registers
@ 2004-03-03 16:46 Dorit Naishlos
  2004-03-03 17:52 ` David Edelsohn
  2004-03-03 18:44 ` Dale Johannesen
  0 siblings, 2 replies; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-03 16:46 UTC (permalink / raw)
  To: gcc

Hi,

I think there is a problem in the way we model the setting of subregs (insn
"insvsi") in rs6000, or rather - a problem in the way reload phase handles
these patterns when they are generated to express an initialization of a
vector register. Consider the following example:

typedef int __attribute__((mode(V8HI))) v8hi;
#define N 1024
void foo5 (short n){
    short a[N];
    v8hi *pa = (v8hi *)a;
>>  v8hi va = {n,n,n,n,n,n,n,n};
    int i;

    for (i=0; i<N/8; i++){
      pa[i] = va;
    }
    bar1 (pa[2]);
}

In the RTL, this is expressed as a sequence of 8 insns that copy 'n' (which
resides in a scalar register) into each of 8 subregs in the temporary 'va'.
This takes place before the loop, and inside the loop we have a vector
store of 'va'. Later on, this initialization sequence of subregs will have
to be spilled - the 8 scalar registers (which hold the value of 'n') will
be spilled to memory, and a vector load will combine the 8 values into one
vector register.

This is indeed what happens when I compile the above program on i386 with
-msse2; the resulting code is efficient - with the 8 scalar stores and one
vector load before the loop, and only a vector store inside the loop.

However, compiling for powerpc with -maltivec, instead of spilling the 8
scalar registers before the loop, the register allocator decides to spill
the vector store insn which is inside the loop. As a result, we get spill
code of invariant data inside the loop. Here is the resulting assembly (the
spill code is marked with '>>'):

L2:
        addi r7,r1,2112
        slwi r3,r2,4
>>      stw r9,0(r7)
        addi r2,r2,1
>>      stw r10,4(r7)
>>      stw r11,8(r7)
>>      stw r12,12(r7)
>>      lvx v0,0,r7
        stvx v0,r3,r8
        bdnz L2

Below is some more detail; My question is - how to fix the machine
description such that reload phase will spill the subreg initialization
insns (outside the loop) as it does for i386 ?

thanks,

dorit

More detail:
====================

Actually, if you try to compile the above program with -maltivec, you'll
get ICE'd during reload with the following error:

simd-inv.c: In function `foo5':
simd-inv.c:29: error: unrecognizable insn:
(insn 88 87 89 0 (set (mem:V8HI (reg:SI 9 r9) [0 S16 A8])
        (reg:V8HI 2 r2 [126])) -1 (nil)
    (nil))
simd-inv.c:29: internal compiler error: in extract_insn, at recog.c:2083

This is because of a restriction I added a month ago to the following
define_insn in altivec.md (last 2 lines):

   (define_insn "*movv8hi_internal1"
     [(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
           (match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
     "TARGET_ALTIVEC
>>    && (altivec_register_operand (operands[0], V8HImode)
>>        || altivec_register_operand (operands[1], V8HImode))"

If I remove the above 2 lines, compilation succeeds; However... we get the
same inefficiencies that brought us to add these 2 lines in the first place
(loop invariants don't get pulled out -
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg02884.html);

(another question is how to model "*movv8hi_internal1" - we want to keep
the new restriction for the case of constants, however, looks like it's too
strict a restriction for non-constant inputs).

Here is what happens during compilation of the above example program when I
remove the 2 restriction lines from define_insn:

Up to phase .c.24.lreg,
=======================
we have a sequence of 8 insns in the loop prolog that initialize the vector
temporary 'va'; each of these insns looks something like:

(insn:HI ... (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ va ]) 0)
            (const_int 16 [0x10])
            (const_int 0 [0x0]))
     (reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 11 (insn_list 3 (nil)))
    (nil))

Inside the loop we have the store of 'va' into memory:

LOOP:
(insn:HI 25 24 27 1 (set (mem:V8HI (plus:SI (reg:SI 123)
                (reg/v/f:SI 119 [ pa ])) [4 S16 A128])
    (reg/v:V8HI 120 [ va ])) 554 {altivec_stvx_8hi} (insn_list 24 (nil))
    (expr_list:REG_DEAD (reg:SI 123)
        (nil)))

During phase .c.25.greg,
========================
the compiler does not report any spills for the initialization insns,
however it reports a spill for the vector store insn that's in the loop:

Reloads for insn # 25
Reload 0: GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0), optional,
can't combine, secondary_reload_p
Reload 1: reload_out (V8HI) = (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
                  (reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
        NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
        reload_out_reg: (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
                  (reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
        secondary_out_reload = 0

Reload 2: reload_in (SI) = (plus:SI (reg/f:SI 1 r1)
                  (const_int 2112 [0x840]))
        BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine
        reload_in_reg: (plus:SI (reg/f:SI 1 r1)
                  (const_int 2112 [0x840]))
        reload_reg_rtx: (reg:SI 7 r7)
Reload 3: reload_in (V8HI) = (reg/v:V8HI 9 r9 [orig:120 va ] [120])
        ALTIVEC_REGS, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v:V8HI 9 r9 [orig:120 va ] [120])
        reload_reg_rtx: (reg:V8HI 77 v0)

As a result, we now have a spill in the loop:
==============================================
LOOP:
[1] (insn 62 61 63 1 (set (mem:V8HI (reg:SI 7 r7) [0 S16 A8])
    (reg/v:V8HI 9 r9 [orig:120 va ] [120])) 558 {*movv8hi_internal1}
(nil) (nil))
[2] (insn 63 62 25 1 (set (reg:V8HI 77 v0)
        (mem:V8HI (reg:SI 7 r7) [0 S16 A8])) 550 {altivec_lvx_8hi} (nil)
      (nil))
[3] (insn:HI 25 63 27 1 (set (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
        (reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
      (reg:V8HI 77 v0)) 554 {altivec_stvx_8hi} (insn_list 24 (nil))
    (nil))

insns [1] and [2] are the new spill code (insn [1] is the one that causes
the ICE I described above). Finally, during phase .c.29.rnreg, insn [1] is
expanded into a sequence of scalar insns, each of which looks like:

(insn 64 61 65 1 (set (mem:SI (reg:SI 7 r7) [0 S4 A8])
        (reg:SI 9 r9 [ va ])) 309 {*movsi_internal1} (nil)
    (nil))

In i386, the RTL of the subreg initialization insns looks as follows:

(insn:HI 41 40 43 0 (parallel [
            (set (subreg:SI (reg/v:V8HI 61 [ va ]) 8)
                (ior:SI (reg:SI 76)
                    (reg:SI 65)))
            (clobber (reg:CC 17 flags))
        ]) 209 {*iorsi_1} (insn_list 39 (nil))
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg:SI 76)
            (nil))))

and they get spilled during .c.25.greg, and remain out side the loop.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-03-23 20:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <OF6E029669.0E5E3F05-ONC2256E5C.000482C4-C2256E5C.0004D314@il.ibm.com>
2004-03-19  8:45 ` rs6000.md/altivec.md problem in setting of vector registers David Edelsohn
2004-03-21  0:36   ` Dorit Naishlos
2004-03-23 22:10     ` David Edelsohn
2004-03-23 17:03       ` Dorit Naishlos
2004-03-19 20:59 ` Dale Johannesen
2004-03-21  1:47   ` Dorit Naishlos
2004-03-03 16:46 Dorit Naishlos
2004-03-03 17:52 ` David Edelsohn
2004-03-03 18:16   ` Dorit Naishlos
2004-03-03 18:44 ` Dale Johannesen
2004-03-05  0:06   ` Dorit Naishlos
2004-03-05  0:23     ` Dale Johannesen
2004-03-09 18:46       ` David Edelsohn
2004-03-11 22:38     ` David Edelsohn
2004-03-11 23:31       ` Richard Henderson
2004-03-12  3:14         ` David Edelsohn
2004-03-07 18:30   ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).