From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22349 invoked by alias); 30 Sep 2013 14:00:07 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 22328 invoked by uid 89); 30 Sep 2013 14:00:07 -0000 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Sep 2013 14:00:07 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 30 Sep 2013 15:00:02 +0100 Received: from e103625-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Mon, 30 Sep 2013 15:00:02 +0100 Date: Mon, 30 Sep 2013 14:00:00 -0000 From: Vidya Praveen To: Richard Biener Cc: "gcc@gcc.gnu.org" , "ook@ucw.cz" Subject: Re: [RFC] Vectorization of indexed elements Message-ID: <20130930140001.GF3460@e103625-lin.cambridge.arm.com> References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> <20130924150425.GE22907@e103625-lin.cambridge.arm.com> <20130927145008.GA861@e103625-lin.cambridge.arm.com> <20130927151945.GB861@e103625-lin.cambridge.arm.com> <20130930125454.GD3460@e103625-lin.cambridge.arm.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-MC-Unique: 113093015000209401 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00252.txt.bz2 On Mon, Sep 30, 2013 at 02:19:32PM +0100, Richard Biener wrote: > On Mon, 30 Sep 2013, Vidya Praveen wrote: >=20 > > On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote: > > > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: > > > [...] > > > > > > I can't really insist on the single lane load.. something like: > > > > > >=20 > > > > > > vc:V4SI[0] =3D c > > > > > > vt:V4SI =3D vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) > > > > > > va:V4SI =3D vb:V4SI vt:V4SI > > > > > >=20 > > > > > > Or is there any other way to do this? > > > > >=20 > > > > > Can you elaborate on "I can't really insist on the single lane lo= ad"? > > > > > What's the single lane load in your example?=20 > > > >=20 > > > > Loading just one lane of the vector like this: > > > >=20 > > > > vc:V4SI[0] =3D c // from the above scalar example > > > >=20 > > > > or=20 > > > >=20 > > > > vc:V4SI[0] =3D c[2]=20 > > > >=20 > > > > is what I meant by single lane load. In this example: > > > >=20 > > > > t =3D c[2]=20 > > > > ... > > > > vb:v4si =3D b[0:3]=20 > > > > vc:v4si =3D { t, t, t, t } > > > > va:v4si =3D vb:v4si vc:v4si=20 > > > >=20 > > > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I= cannot > > > > insist 't' to be vector and t =3D c[2] to be vect_t[0] =3D c[2] (wh= ich could be=20 > > > > seen as vec_select:SI (vect_t 0) ).=20 > > > >=20 > > > > > I'd expect the instruction > > > > > pattern as quoted to just work (and I hope we expand an uniform > > > > > constructor { a, a, a, a } properly using vec_duplicate). > > > >=20 > > > > As much as I went through the code, this is only done using vect_in= it. It is > > > > not expanded as vec_duplicate from, for example, store_constructor(= ) of expr.c > > >=20 > > > Do you see any issues if we expand such constructor as vec_duplicate = directly=20 > > > instead of going through vect_init way?=20 > >=20 > > Sorry, that was a bad question. > >=20 > > But here's what I would like to propose as a first step. Please tell me= if this > > is acceptable or if it makes sense: > >=20 > > - Introduce standard pattern names=20 > >=20 > > "vmulim4" - vector muliply with second operand as indexed operand > >=20 > > Example: > >=20 > > (define_insn "vmuliv4si4" > > [set (match_operand:V4SI 0 "register_operand") > > (mul:V4SI (match_operand:V4SI 1 "register_operand") > > (vec_duplicate:V4SI > > (vec_select:SI > > (match_operand:V4SI 2 "register_operand") > > (match_operand:V4SI 3 "immediate_operand)))))] > > ... > > ) >=20 > We could factor this with providing a standard pattern name for >=20 > (define_insn "vdupi" > [set (match_operand: 0 "register_operand") > (vec_duplicate: > (vec_select: > (match_operand: 1 "register_operand") > (match_operand:SI 2 "immediate_operand))))] This is good. I did think about this but then I thought of avoiding the need for combiner patterns :-)=20 But do you find the lane specific mov pattern I proposed, acceptable?=20 > (you use V4SI for the immediate?=20=20 Sorry typo again!! It should've been SI. > Ideally vdupi has another custom > mode for the vector index). >=20 > Note that this factored pattern is already available as vec_perm_const! > It is simply (vec_perm_const:V4SI ). >=20 > Which means that on the GIMPLE level we should try to combine >=20 > el_4 =3D BIT_FIELD_REF ; > v_5 =3D { el_4, el_4, ... }; I don't think we reach this state at all for the scenarios in discussion. what we generally have is: el_4 =3D MEM_REF < array + index*size > v_5 =3D { el_4, ... } Or am I missing something? >=20 > into >=20 > v_5 =3D VEC_PERM_EXPR ; >=20 > which it should already do with simplify_permutation. >=20 > But I'm not sure what you are after at then end ;) >=20 > Richard. > =20 Regards VP