From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20611 invoked by alias); 30 Sep 2013 12:55:01 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 20597 invoked by uid 89); 30 Sep 2013 12:55:01 -0000 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Sep 2013 12:55:01 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 30 Sep 2013 13:54:57 +0100 Received: from e103625-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Mon, 30 Sep 2013 13:54:54 +0100 Date: Mon, 30 Sep 2013 12:55:00 -0000 From: Vidya Praveen To: Richard Biener Cc: "gcc@gcc.gnu.org" , "ook@ucw.cz" Subject: Re: [RFC] Vectorization of indexed elements Message-ID: <20130930125454.GD3460@e103625-lin.cambridge.arm.com> References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> <20130924150425.GE22907@e103625-lin.cambridge.arm.com> <20130927145008.GA861@e103625-lin.cambridge.arm.com> <20130927151945.GB861@e103625-lin.cambridge.arm.com> MIME-Version: 1.0 In-Reply-To: <20130927151945.GB861@e103625-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-MC-Unique: 113093013545700201 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00248.txt.bz2 On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote: > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: > [...] > > > > I can't really insist on the single lane load.. something like: > > > >=20 > > > > vc:V4SI[0] =3D c > > > > vt:V4SI =3D vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) > > > > va:V4SI =3D vb:V4SI vt:V4SI > > > >=20 > > > > Or is there any other way to do this? > > >=20 > > > Can you elaborate on "I can't really insist on the single lane load"? > > > What's the single lane load in your example?=20 > >=20 > > Loading just one lane of the vector like this: > >=20 > > vc:V4SI[0] =3D c // from the above scalar example > >=20 > > or=20 > >=20 > > vc:V4SI[0] =3D c[2]=20 > >=20 > > is what I meant by single lane load. In this example: > >=20 > > t =3D c[2]=20 > > ... > > vb:v4si =3D b[0:3]=20 > > vc:v4si =3D { t, t, t, t } > > va:v4si =3D vb:v4si vc:v4si=20 > >=20 > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I can= not > > insist 't' to be vector and t =3D c[2] to be vect_t[0] =3D c[2] (which = could be=20 > > seen as vec_select:SI (vect_t 0) ).=20 > >=20 > > > I'd expect the instruction > > > pattern as quoted to just work (and I hope we expand an uniform > > > constructor { a, a, a, a } properly using vec_duplicate). > >=20 > > As much as I went through the code, this is only done using vect_init. = It is > > not expanded as vec_duplicate from, for example, store_constructor() of= expr.c >=20 > Do you see any issues if we expand such constructor as vec_duplicate dire= ctly=20 > instead of going through vect_init way?=20 Sorry, that was a bad question. But here's what I would like to propose as a first step. Please tell me if = this is acceptable or if it makes sense: - Introduce standard pattern names=20 "vmulim4" - vector muliply with second operand as indexed operand Example: (define_insn "vmuliv4si4" [set (match_operand:V4SI 0 "register_operand") (mul:V4SI (match_operand:V4SI 1 "register_operand") (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI 2 "register_operand") (match_operand:V4SI 3 "immediate_operand)))))] ... ) "vlmovmn3" - move where one of the operands is specific lane of a vector an= d=20 other is a scalar.=20 Example: (define_insn "vlmovv4sisi3" [set (vec_select:SI (match_operand:V4SI 0 "register_operand") (match_operand:SI 1 "immediate_operand")) (match_operand:SI 2 "memory_operand")] ... ) - Identify the following idiom and expand through the above standard patter= ns: t =3D c[m]=20 vc[0:n] =3D { t, t, t, t} a[0:n] =3D b[0:n] * vc[0:n]=20 as=20 (insn (set (vec_select:SI (reg:V4SI 0) 0) (mem:SI ... ))) (insn (set (reg:V4SI 1) (mult:V4SI (reg:V4SI 2) (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 0) 0)))= )) If this path is acceptable, then I can extend this to support=20 "vmaddim4" - multiply and add (with indexed element as multiplier) "vmsubim4" - multiply and subtract (with indexed element as multiplier) Please let me know your thoughts. Cheers VP