From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-180289-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20611 invoked by alias); 30 Sep 2013 12:55:01 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 20597 invoked by uid 89); 30 Sep 2013 12:55:01 -0000
Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Sep 2013 12:55:01 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 30 Sep 2013 13:54:57 +0100
Received: from e103625-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0);	 Mon, 30 Sep 2013 13:54:54 +0100
Date: Mon, 30 Sep 2013 12:55:00 -0000
From: Vidya Praveen <vidyapraveen@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>, "ook@ucw.cz" <ook@ucw.cz>
Subject: Re: [RFC] Vectorization of indexed elements
Message-ID: <20130930125454.GD3460@e103625-lin.cambridge.arm.com>
References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> <alpine.DEB.2.10.1309091949090.3565@laptop-mg.saclay.inria.fr> <20130924150425.GE22907@e103625-lin.cambridge.arm.com> <alpine.LNX.2.00.1309251123490.29411@zhemvz.fhfr.qr> <20130927145008.GA861@e103625-lin.cambridge.arm.com> <20130927151945.GB861@e103625-lin.cambridge.arm.com>
MIME-Version: 1.0
In-Reply-To: <20130927151945.GB861@e103625-lin.cambridge.arm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-MC-Unique: 113093013545700201
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-IsSubscribed: yes
X-SW-Source: 2013-09/txt/msg00248.txt.bz2

On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
> On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
> [...]
> > > > I can't really insist on the single lane load.. something like:
> > > >=20
> > > > vc:V4SI[0] =3D c
> > > > vt:V4SI =3D vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
> > > > va:V4SI =3D vb:V4SI <op> vt:V4SI
> > > >=20
> > > > Or is there any other way to do this?
> > >=20
> > > Can you elaborate on "I can't really insist on the single lane load"?
> > > What's the single lane load in your example?=20
> >=20
> > Loading just one lane of the vector like this:
> >=20
> > vc:V4SI[0] =3D c // from the above scalar example
> >=20
> > or=20
> >=20
> > vc:V4SI[0] =3D c[2]=20
> >=20
> > is what I meant by single lane load. In this example:
> >=20
> > t =3D c[2]=20
> > ...
> > vb:v4si =3D b[0:3]=20
> > vc:v4si =3D { t, t, t, t }
> > va:v4si =3D vb:v4si <op> vc:v4si=20
> >=20
> > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I can=
not
> > insist 't' to be vector and t =3D c[2] to be vect_t[0] =3D c[2] (which =
could be=20
> > seen as vec_select:SI (vect_t 0) ).=20
> >=20
> > > I'd expect the instruction
> > > pattern as quoted to just work (and I hope we expand an uniform
> > > constructor { a, a, a, a } properly using vec_duplicate).
> >=20
> > As much as I went through the code, this is only done using vect_init. =
It is
> > not expanded as vec_duplicate from, for example, store_constructor() of=
 expr.c
>=20
> Do you see any issues if we expand such constructor as vec_duplicate dire=
ctly=20
> instead of going through vect_init way?=20

Sorry, that was a bad question.

But here's what I would like to propose as a first step. Please tell me if =
this
is acceptable or if it makes sense:

- Introduce standard pattern names=20

"vmulim4" - vector muliply with second operand as indexed operand

Example:

(define_insn "vmuliv4si4"
   [set (match_operand:V4SI 0 "register_operand")
        (mul:V4SI (match_operand:V4SI 1 "register_operand")
                  (vec_duplicate:V4SI
                    (vec_select:SI
                      (match_operand:V4SI 2 "register_operand")
                      (match_operand:V4SI 3 "immediate_operand)))))]
 ...
)

"vlmovmn3" - move where one of the operands is specific lane of a vector an=
d=20
             other is a scalar.=20

Example:

(define_insn "vlmovv4sisi3"
  [set (vec_select:SI (match_operand:V4SI 0 "register_operand")
                      (match_operand:SI 1 "immediate_operand"))
       (match_operand:SI 2 "memory_operand")]
  ...
)

- Identify the following idiom and expand through the above standard patter=
ns:

  t =3D c[m]=20
  vc[0:n] =3D { t, t, t, t}
  a[0:n] =3D b[0:n] * vc[0:n]=20

as=20

 (insn (set (vec_select:SI (reg:V4SI 0) 0) (mem:SI ... )))
 (insn (set (reg:V4SI 1)
            (mult:V4SI (reg:V4SI 2)
                       (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 0) 0)))=
))

If this path is acceptable, then I can extend this to support=20

"vmaddim4" - multiply and add (with indexed element as multiplier)
"vmsubim4" - multiply and subtract (with indexed element as multiplier)

Please let me know your thoughts.

Cheers
VP