From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12122 invoked by alias); 24 Sep 2013 15:04:31 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 12087 invoked by uid 89); 24 Sep 2013 15:04:30 -0000 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 24 Sep 2013 15:04:30 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 24 Sep 2013 16:04:27 +0100 Received: from e103625-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 24 Sep 2013 16:04:26 +0100 Date: Tue, 24 Sep 2013 15:04:00 -0000 From: Vidya Praveen To: "gcc@gcc.gnu.org" Cc: "rguenther@suse.de" , "ook@ucw.cz" Subject: Re: [RFC] Vectorization of indexed elements Message-ID: <20130924150425.GE22907@e103625-lin.cambridge.arm.com> References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-MC-Unique: 113092416042700701 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00214.txt.bz2 On Mon, Sep 09, 2013 at 07:02:52PM +0100, Marc Glisse wrote: > On Mon, 9 Sep 2013, Vidya Praveen wrote: >=20 > > Hello, > > > > This post details some thoughts on an enhancement to the vectorizer that > > could take advantage of the SIMD instructions that allows indexed eleme= nt > > as an operand thus reducing the need for duplication and possibly impro= ve > > reuse of previously loaded data. > > > > Appreciate your opinion on this. > > > > --- > > > > A phrase like this: > > > > for(i=3D0;i<4;i++) > > a[i] =3D b[i] c[2]; > > > > is usually vectorized as: > > > > va:V4SI =3D a[0:3] > > vb:V4SI =3D b[0:3] > > t =3D c[2] > > vc:V4SI =3D { t, t, t, t } // typically expanded as vec_duplicate at v= ec_init > > ... > > va:V4SI =3D vb:V4SI vc:V4SI > > > > But this could be simplified further if a target has instructions that = support > > indexed element as a parameter. For example an instruction like this: > > > > mul v0.4s, v1.4s, v2.4s[2] > > > > can perform multiplication of each element of v2.4s with the third elem= ent of > > v2.4s (specified as v2.4s[2]) and store the results in the corresponding > > elements of v0.4s. > > > > For this to happen, vectorizer needs to understand this idiom and treat= the > > operand c[2] specially (and by taking in to consideration if the machine > > supports indexed element as an operand for through a target hook o= r macro) > > and consider this as vectorizable statement without having to duplicate= the > > elements explicitly. > > > > There are fews ways this could be represented at gimple: > > > > ... > > va:V4SI =3D vb:V4SI VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI = 2)) > > ... > > > > or by allowing a vectorizer treat an indexed element as a valid operand= in a > > vectorizable statement: >=20 > Might as well allow any scalar then... Yes, I had given an example below. >=20 > > ... > > va:V4SI =3D vb:V4SI VEC_SELECT_EXPR (vc:V4SI 2) > > ... > > > > For the sake of explanation, the above two representations assumes that > > c[0:3] is loaded in vc for some other use and reused here. But when c[2= ] is the > > only use of 'c' then it may be safer to just load one element and use i= t like > > this: > > > > vc:V4SI[0] =3D c[2] > > va:V4SI =3D vb:V4SI VEC_SELECT_EXPR (vc:V4SI 0) > > > > This could also mean that expressions involving scalar could be treated > > similarly. For example, > > > > for(i=3D0;i<4;i++) > > a[i] =3D b[i] c > > > > could be vectorized as: > > > > vc:V4SI[0] =3D c > > va:V4SI =3D vb:V4SI VEC_SELECT_EXPR (vc:V4SI 0) > > > > Such a change would also require new standard pattern names to be defin= ed for > > each . > > > > Alternatively, having something like this: > > > > ... > > vt:V4SI =3D VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) > > va:V4SI =3D vb:V4SI vt:V4SI > > ... > > > > would remove the need to introduce several new standard pattern names b= ut have > > just one to represent vec_duplicate(vec_select()) but ofcourse this wil= l expect > > the target to have combiner patterns. >=20 > The cost estimation wouldn't be very good, but aren't combine patterns=20 > enough for the whole thing? Don't you model your mul instruction as: >=20 > (mult:V4SI > (match_operand:V4SI) > (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI)))) >=20 > anyway? Seems that combine should be able to handle it. What currently=20 > happens that we fail to generate the right instruction? At vec_init, I can recognize an idiom in order to generate vec_duplicate but I can't really insist on the single lane load.. something like: vc:V4SI[0] =3D c vt:V4SI =3D vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) va:V4SI =3D vb:V4SI vt:V4SI Or is there any other way to do this? Cheers VP >=20 > In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR=20 > for vec_duplicate, adding new nodes is always painful. >=20 > > This enhancement could possibly help further optimizing larger scenario= s such > > as linear systems. > > > > Regards > > VP >=20 > --=20 > Marc Glisse >