public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* rs6000.md/altivec.md problem in setting of vector registers
@ 2004-03-03 16:46 Dorit Naishlos
  2004-03-03 17:52 ` David Edelsohn
  2004-03-03 18:44 ` Dale Johannesen
  0 siblings, 2 replies; 17+ messages in thread
From: Dorit Naishlos @ 2004-03-03 16:46 UTC (permalink / raw)
  To: gcc

Hi,

I think there is a problem in the way we model the setting of subregs (insn
"insvsi") in rs6000, or rather - a problem in the way reload phase handles
these patterns when they are generated to express an initialization of a
vector register. Consider the following example:

typedef int __attribute__((mode(V8HI))) v8hi;
#define N 1024
void foo5 (short n){
    short a[N];
    v8hi *pa = (v8hi *)a;
>>  v8hi va = {n,n,n,n,n,n,n,n};
    int i;

    for (i=0; i<N/8; i++){
      pa[i] = va;
    }
    bar1 (pa[2]);
}

In the RTL, this is expressed as a sequence of 8 insns that copy 'n' (which
resides in a scalar register) into each of 8 subregs in the temporary 'va'.
This takes place before the loop, and inside the loop we have a vector
store of 'va'. Later on, this initialization sequence of subregs will have
to be spilled - the 8 scalar registers (which hold the value of 'n') will
be spilled to memory, and a vector load will combine the 8 values into one
vector register.

This is indeed what happens when I compile the above program on i386 with
-msse2; the resulting code is efficient - with the 8 scalar stores and one
vector load before the loop, and only a vector store inside the loop.

However, compiling for powerpc with -maltivec, instead of spilling the 8
scalar registers before the loop, the register allocator decides to spill
the vector store insn which is inside the loop. As a result, we get spill
code of invariant data inside the loop. Here is the resulting assembly (the
spill code is marked with '>>'):

L2:
        addi r7,r1,2112
        slwi r3,r2,4
>>      stw r9,0(r7)
        addi r2,r2,1
>>      stw r10,4(r7)
>>      stw r11,8(r7)
>>      stw r12,12(r7)
>>      lvx v0,0,r7
        stvx v0,r3,r8
        bdnz L2

Below is some more detail; My question is - how to fix the machine
description such that reload phase will spill the subreg initialization
insns (outside the loop) as it does for i386 ?

thanks,

dorit


More detail:
====================

Actually, if you try to compile the above program with -maltivec, you'll
get ICE'd during reload with the following error:

simd-inv.c: In function `foo5':
simd-inv.c:29: error: unrecognizable insn:
(insn 88 87 89 0 (set (mem:V8HI (reg:SI 9 r9) [0 S16 A8])
        (reg:V8HI 2 r2 [126])) -1 (nil)
    (nil))
simd-inv.c:29: internal compiler error: in extract_insn, at recog.c:2083

This is because of a restriction I added a month ago to the following
define_insn in altivec.md (last 2 lines):

   (define_insn "*movv8hi_internal1"
     [(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
           (match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
     "TARGET_ALTIVEC
>>    && (altivec_register_operand (operands[0], V8HImode)
>>        || altivec_register_operand (operands[1], V8HImode))"

If I remove the above 2 lines, compilation succeeds; However... we get the
same inefficiencies that brought us to add these 2 lines in the first place
(loop invariants don't get pulled out -
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg02884.html);

(another question is how to model "*movv8hi_internal1" - we want to keep
the new restriction for the case of constants, however, looks like it's too
strict a restriction for non-constant inputs).

Here is what happens during compilation of the above example program when I
remove the 2 restriction lines from define_insn:

Up to phase .c.24.lreg,
=======================
we have a sequence of 8 insns in the loop prolog that initialize the vector
temporary 'va'; each of these insns looks something like:

(insn:HI ... (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ va ]) 0)
            (const_int 16 [0x10])
            (const_int 0 [0x0]))
     (reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 11 (insn_list 3 (nil)))
    (nil))

Inside the loop we have the store of 'va' into memory:

LOOP:
(insn:HI 25 24 27 1 (set (mem:V8HI (plus:SI (reg:SI 123)
                (reg/v/f:SI 119 [ pa ])) [4 S16 A128])
    (reg/v:V8HI 120 [ va ])) 554 {altivec_stvx_8hi} (insn_list 24 (nil))
    (expr_list:REG_DEAD (reg:SI 123)
        (nil)))

During phase .c.25.greg,
========================
the compiler does not report any spills for the initialization insns,
however it reports a spill for the vector store insn that's in the loop:

Reloads for insn # 25
Reload 0: GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0), optional,
can't combine, secondary_reload_p
Reload 1: reload_out (V8HI) = (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
                  (reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
        NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
        reload_out_reg: (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
                  (reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
        secondary_out_reload = 0

Reload 2: reload_in (SI) = (plus:SI (reg/f:SI 1 r1)
                  (const_int 2112 [0x840]))
        BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine
        reload_in_reg: (plus:SI (reg/f:SI 1 r1)
                  (const_int 2112 [0x840]))
        reload_reg_rtx: (reg:SI 7 r7)
Reload 3: reload_in (V8HI) = (reg/v:V8HI 9 r9 [orig:120 va ] [120])
        ALTIVEC_REGS, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v:V8HI 9 r9 [orig:120 va ] [120])
        reload_reg_rtx: (reg:V8HI 77 v0)

As a result, we now have a spill in the loop:
==============================================
LOOP:
[1] (insn 62 61 63 1 (set (mem:V8HI (reg:SI 7 r7) [0 S16 A8])
    (reg/v:V8HI 9 r9 [orig:120 va ] [120])) 558 {*movv8hi_internal1}
(nil) (nil))
[2] (insn 63 62 25 1 (set (reg:V8HI 77 v0)
        (mem:V8HI (reg:SI 7 r7) [0 S16 A8])) 550 {altivec_lvx_8hi} (nil)
      (nil))
[3] (insn:HI 25 63 27 1 (set (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
        (reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
      (reg:V8HI 77 v0)) 554 {altivec_stvx_8hi} (insn_list 24 (nil))
    (nil))

insns [1] and [2] are the new spill code (insn [1] is the one that causes
the ICE I described above). Finally, during phase .c.29.rnreg, insn [1] is
expanded into a sequence of scalar insns, each of which looks like:

(insn 64 61 65 1 (set (mem:SI (reg:SI 7 r7) [0 S4 A8])
        (reg:SI 9 r9 [ va ])) 309 {*movsi_internal1} (nil)
    (nil))


In i386, the RTL of the subreg initialization insns looks as follows:

(insn:HI 41 40 43 0 (parallel [
            (set (subreg:SI (reg/v:V8HI 61 [ va ]) 8)
                (ior:SI (reg:SI 76)
                    (reg:SI 65)))
            (clobber (reg:CC 17 flags))
        ]) 209 {*iorsi_1} (insn_list 39 (nil))
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg:SI 76)
            (nil))))

and they get spilled during .c.25.greg, and remain out side the loop.

^ permalink raw reply	[flat|nested] 17+ messages in thread
[parent not found: <OF6E029669.0E5E3F05-ONC2256E5C.000482C4-C2256E5C.0004D314@il.ibm.com>]

end of thread, other threads:[~2004-03-23 20:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-03 16:46 rs6000.md/altivec.md problem in setting of vector registers Dorit Naishlos
2004-03-03 17:52 ` David Edelsohn
2004-03-03 18:16   ` Dorit Naishlos
2004-03-03 18:44 ` Dale Johannesen
2004-03-05  0:06   ` Dorit Naishlos
2004-03-05  0:23     ` Dale Johannesen
2004-03-09 18:46       ` David Edelsohn
2004-03-11 22:38     ` David Edelsohn
2004-03-11 23:31       ` Richard Henderson
2004-03-12  3:14         ` David Edelsohn
2004-03-07 18:30   ` Aldy Hernandez
     [not found] <OF6E029669.0E5E3F05-ONC2256E5C.000482C4-C2256E5C.0004D314@il.ibm.com>
2004-03-19  8:45 ` David Edelsohn
2004-03-21  0:36   ` Dorit Naishlos
2004-03-23 22:10     ` David Edelsohn
2004-03-23 17:03       ` Dorit Naishlos
2004-03-19 20:59 ` Dale Johannesen
2004-03-21  1:47   ` Dorit Naishlos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).