From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-180255-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 12122 invoked by alias); 24 Sep 2013 15:04:31 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 12087 invoked by uid 89); 24 Sep 2013 15:04:30 -0000
Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 24 Sep 2013 15:04:30 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-4.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 24 Sep 2013 16:04:27 +0100
Received: from e103625-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0);	 Tue, 24 Sep 2013 16:04:26 +0100
Date: Tue, 24 Sep 2013 15:04:00 -0000
From: Vidya Praveen <vidyapraveen@arm.com>
To: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Cc: "rguenther@suse.de" <rguenther@suse.de>, "ook@ucw.cz" <ook@ucw.cz>
Subject: Re: [RFC] Vectorization of indexed elements
Message-ID: <20130924150425.GE22907@e103625-lin.cambridge.arm.com>
References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> <alpine.DEB.2.10.1309091949090.3565@laptop-mg.saclay.inria.fr>
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.10.1309091949090.3565@laptop-mg.saclay.inria.fr>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-MC-Unique: 113092416042700701
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-IsSubscribed: yes
X-SW-Source: 2013-09/txt/msg00214.txt.bz2

On Mon, Sep 09, 2013 at 07:02:52PM +0100, Marc Glisse wrote:
> On Mon, 9 Sep 2013, Vidya Praveen wrote:
>=20
> > Hello,
> >
> > This post details some thoughts on an enhancement to the vectorizer that
> > could take advantage of the SIMD instructions that allows indexed eleme=
nt
> > as an operand thus reducing the need for duplication and possibly impro=
ve
> > reuse of previously loaded data.
> >
> > Appreciate your opinion on this.
> >
> > ---
> >
> > A phrase like this:
> >
> > for(i=3D0;i<4;i++)
> >   a[i] =3D b[i] <op> c[2];
> >
> > is usually vectorized as:
> >
> >  va:V4SI =3D a[0:3]
> >  vb:V4SI =3D b[0:3]
> >  t =3D c[2]
> >  vc:V4SI =3D { t, t, t, t } // typically expanded as vec_duplicate at v=
ec_init
> >  ...
> >  va:V4SI =3D vb:V4SI <op> vc:V4SI
> >
> > But this could be simplified further if a target has instructions that =
support
> > indexed element as a parameter. For example an instruction like this:
> >
> >  mul v0.4s, v1.4s, v2.4s[2]
> >
> > can perform multiplication of each element of v2.4s with the third elem=
ent of
> > v2.4s (specified as v2.4s[2]) and store the results in the corresponding
> > elements of v0.4s.
> >
> > For this to happen, vectorizer needs to understand this idiom and treat=
 the
> > operand c[2] specially (and by taking in to consideration if the machine
> > supports indexed element as an operand for <op> through a target hook o=
r macro)
> > and consider this as vectorizable statement without having to duplicate=
 the
> > elements explicitly.
> >
> > There are fews ways this could be represented at gimple:
> >
> >  ...
> >  va:V4SI =3D vb:V4SI <op> VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI =
2))
> >  ...
> >
> > or by allowing a vectorizer treat an indexed element as a valid operand=
 in a
> > vectorizable statement:
>=20
> Might as well allow any scalar then...

Yes, I had given an example below.

>=20
> >  ...
> >  va:V4SI =3D vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 2)
> >  ...
> >
> > For the sake of explanation, the above two representations assumes that
> > c[0:3] is loaded in vc for some other use and reused here. But when c[2=
] is the
> > only use of 'c' then it may be safer to just load one element and use i=
t like
> > this:
> >
> >  vc:V4SI[0] =3D c[2]
> >  va:V4SI =3D vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
> >
> > This could also mean that expressions involving scalar could be treated
> > similarly. For example,
> >
> >  for(i=3D0;i<4;i++)
> >    a[i] =3D b[i] <op> c
> >
> > could be vectorized as:
> >
> >  vc:V4SI[0] =3D c
> >  va:V4SI =3D vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
> >
> > Such a change would also require new standard pattern names to be defin=
ed for
> > each <op>.
> >
> > Alternatively, having something like this:
> >
> >  ...
> >  vt:V4SI =3D VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  va:V4SI =3D vb:V4SI <op> vt:V4SI
> >  ...
> >
> > would remove the need to introduce several new standard pattern names b=
ut have
> > just one to represent vec_duplicate(vec_select()) but ofcourse this wil=
l expect
> > the target to have combiner patterns.
>=20
> The cost estimation wouldn't be very good, but aren't combine patterns=20
> enough for the whole thing? Don't you model your mul instruction as:
>=20
> (mult:V4SI
>    (match_operand:V4SI)
>    (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI))))
>=20
> anyway? Seems that combine should be able to handle it. What currently=20
> happens that we fail to generate the right instruction?

At vec_init, I can recognize an idiom in order to generate vec_duplicate but
I can't really insist on the single lane load.. something like:

vc:V4SI[0] =3D c
vt:V4SI =3D vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
va:V4SI =3D vb:V4SI <op> vt:V4SI

Or is there any other way to do this?

Cheers
VP

>=20
> In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR=20
> for vec_duplicate, adding new nodes is always painful.
>=20
> > This enhancement could possibly help further optimizing larger scenario=
s such
> > as linear systems.
> >
> > Regards
> > VP
>=20
> --=20
> Marc Glisse
>