Re: rs6000.md/altivec.md problem in setting of vector registers

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Dorit Naishlos <DORIT@il.ibm.com>
To: David Edelsohn <dje@makai.watson.ibm.com>
Cc: gcc@gcc.gnu.org
Subject: Re: rs6000.md/altivec.md problem in setting of vector registers
Date: Tue, 23 Mar 2004 17:03:00 -0000	[thread overview]
Message-ID: <OF2DC81A02.1AE457EC-ONC2256E60.002FED31-C2256E60.0030861B@il.ibm.com> (raw)
In-Reply-To: <200403202218.i2KMIGT27996@makai.watson.ibm.com>


I won't be able to dedicate much time at the moment to this rs6000 code
generation problem. I've included a test case that displays it, in case any
one would like to look into it:

typedef int __attribute__((mode(V4SI))) v4si;
typedef int aint __attribute__ ((__aligned__(16)));
#define N 1024
typedef union {
   aint a[N];
   v4si pa[N/4];
} vec_union;

void
foo (short n){
  vec_union a;
  v4si va = {n,n,n,n};
  int i;

  for (i=0; i<N/4; i++){
    a.pa[i] = va;
  }
}

Below is the code that is being generated on powerpc and i386.

dorit


This is the code generated on powerpc, compiling with -O3 -floop-optimize2
-maltivec:
(relevant code marked with ">>";
4 scalar stores + 1 vector load, all invariant, all in the loop)

foo:
        mfspr r5,256
        oris r12,r5,0x8000
        stw r5,-8(r1)
        mtspr 256,r12
        li r0,256
        mflr r4
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        stw r31,-4(r1)
        mr r9,r3
        mflr r31
        mr r10,r3
        stw r4,8(r1)
        mr r11,r3
        mr r12,r3
        mtctr r0
        addis r2,r31,ha16(L_a$non_lazy_ptr-"L00000000001$pb")
        lwz r8,lo16(L_a$non_lazy_ptr-"L00000000001$pb")(r2)
        li r2,0
L4:
        addi r7,r1,-32
        slwi r3,r2,4
>>      stw r9,0(r7)
        addi r2,r2,1
>>      stw r10,4(r7)
>>      stw r11,8(r7)
>>      stw r12,12(r7)
>>      lvx v0,0,r7
        stvx v0,r3,r8
        bdnz L4

        lwz r8,-8(r1)
        mtspr 256,r8
        lwz r6,8(r1)
        lwz r31,-4(r1)
        mtlr r6
        blr


This is the code generated on i386, compiling with -O3 -msse2:
(relevant code marked with ">>";
4 scalar stores out of the loop. 1 invariant vector load, in the loop)

foo:
        pushl   %ebp
        xorl    %edx, %edx
        movl    %esp, %ebp
        subl    $24, %esp
        movswl  8(%ebp),%eax
>>      movl    %eax, -24(%ebp)
>>      movl    %eax, -20(%ebp)
>>      movl    %eax, -16(%ebp)
>>      movl    %eax, -12(%ebp)
        movdqa  -24(%ebp), %xmm0
        .p2align 4,,15
.L5:
>>      movl    %edx, %ecx
        incl    %edx
        sall    $4, %ecx
        movdqa  %xmm0, a(%ecx)
        cmpl    $255, %edx
        jle     .L5

        leave
        ret




                                                                                                                                   
                      David Edelsohn                                                                                               
                      <dje@makai.watson        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                            
                      .ibm.com>                cc:       gcc@gcc.gnu.org                                                           
                                               Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers           
                      21/03/2004 00:18                                                                                             
                                                                                                                                   




>>>>> Dorit Naishlos writes:

Dorit> I focused on understanding what in the machine description explains
the
Dorit> different ways Reload handles the same pattern ('set subreg') on the
two
Dorit> platforms (i386/powerpc).

             Altivec and SSE are integrated in their respective
architectures
in different ways, so GCC of one is not alway appropriate for the other.
The vec_set and vec_extract patterns provide explicit control over setting
vector elements, so that probably is the best way to achieve the optimal
behavior.

David

next prev parent reply	other threads:[~2004-03-23  8:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <OF6E029669.0E5E3F05-ONC2256E5C.000482C4-C2256E5C.0004D314@il.ibm.com>
2004-03-19  8:45 ` David Edelsohn
2004-03-21  0:36   ` Dorit Naishlos
2004-03-23 22:10     ` David Edelsohn
2004-03-23 17:03       ` Dorit Naishlos [this message]
2004-03-19 20:59 ` Dale Johannesen
2004-03-21  1:47   ` Dorit Naishlos
2004-03-03 16:46 Dorit Naishlos
2004-03-03 17:52 ` David Edelsohn
2004-03-03 18:16   ` Dorit Naishlos
2004-03-03 18:44 ` Dale Johannesen
2004-03-05  0:06   ` Dorit Naishlos
2004-03-05  0:23     ` Dale Johannesen
2004-03-09 18:46       ` David Edelsohn
2004-03-11 22:38     ` David Edelsohn
2004-03-11 23:31       ` Richard Henderson
2004-03-12  3:14         ` David Edelsohn
2004-03-07 18:30   ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OF2DC81A02.1AE457EC-ONC2256E60.002FED31-C2256E60.0030861B@il.ibm.com \
    --to=dorit@il.ibm.com \
    --cc=dje@makai.watson.ibm.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).