From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-167679-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 12187 invoked by alias); 23 Mar 2011 09:23:22 -0000
Received: (qmail 12162 invoked by uid 22791); 23 Mar 2011 09:23:19 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com) (74.125.82.51)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 23 Mar 2011 09:23:13 +0000
Received: by wwj40 with SMTP id 40so8878635wwj.8        for <gcc@gcc.gnu.org>; Wed, 23 Mar 2011 02:23:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.227.195.76 with SMTP id eb12mr6206098wbb.160.1300872191102; Wed, 23 Mar 2011 02:23:11 -0700 (PDT)
Received: by 10.227.64.142 with HTTP; Wed, 23 Mar 2011 02:23:11 -0700 (PDT)
In-Reply-To: <87k4frlz5c.fsf@firetop.home>
References: <g4lj07w10t.fsf@linaro.org>	<AANLkTi=mKUpvPTvcz83QoyufYDodcc_DeLut-mrVHqs0@mail.gmail.com>	<87k4frlz5c.fsf@firetop.home>
Date: Wed, 23 Mar 2011 09:23:00 -0000
Message-ID: <AANLkTinjtkdp+-3oio2Lvg3vyyjSQNFiebEF6Rzg8s71@mail.gmail.com>
Subject: Re: RFC: Representing vector lane load/store operations
From: Richard Guenther <richard.guenther@gmail.com>
To: Richard Guenther <richard.guenther@gmail.com>, gcc@gcc.gnu.org, rdsandiford@googlemail.com
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2011-03/txt/msg00337.txt.bz2

On Tue, Mar 22, 2011 at 8:43 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> Simple. =A0Just make them registers anyway (I did that in the past
>> when working on middle-end arrays). =A0You'd set DECL_GIMPLE_REG_P
>> on the decl.
>
> OK, thanks, I'll give that a go. =A0TBH, I'm still hopeful we can
> do without it, because we do seem to cope quite well as things stand.
> But I suppose that might not hold true as the examples get more complicat=
ed.
>
>> =A0 4. a vector-of-vectors type
>>
>> =A0 =A0 =A0Cons
>> =A0 =A0 =A0 =A0 * I don't think we want that ;)
>
> Yeah :-)
>
>>> =A0 =A0__builtin_load_lanes (REF : array N*M of X)
>>> =A0 =A0 =A0returns array N of vector M of X
>>> =A0 =A0 =A0maps to vldN on ARM
>>> =A0 =A0 =A0in practice, the result would be used in assignments of the =
form:
>>> =A0 =A0 =A0 =A0vectorY =3D ARRAY_REF <result, Y>
>>>
>>> =A0 =A0__builtin_store_lanes (VECTORS : array N of vector M of X)
>>> =A0 =A0 =A0returns array N*M of X
>>> =A0 =A0 =A0maps to vstN on ARM
>>> =A0 =A0 =A0in practice, the argument would be populated by assignments =
of the form:
>>> =A0 =A0 =A0 =A0ARRAY_REF <VECTORS, Y> =3D vectorY
>>>
>>> =A0 =A0__builtin_load_lane (REF : array N of X,
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 VECTORS : array N of ve=
ctor M of X,
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LANE : integer)
>>> =A0 =A0 =A0returns array N of vector M of X
>>> =A0 =A0 =A0maps to vldN_lane on ARM
>>>
>>> =A0 =A0__builtin_store_lane (VECTORS : array N of vector M of X,
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LANE : integer)
>>> =A0 =A0 =A0returns array N of X
>>> =A0 =A0 =A0maps to vstN_lane on ARM
>>>
>>> =A0 =A0__builtin_load_dup (REF : array N of X)
>>> =A0 =A0 =A0returns array N of vector M of X
>>> =A0 =A0 =A0maps to vldN_dup on ARM
>>>
>>> I've hacked up a prototype of this and it seems to produce good code.
>>> What do you think?
>>
>> How do you expect these to be used? =A0That is, would you ever expect
>> components of those large vectors/arrays be used in operations
>> like add, or does the HW provide vector-lane variants for those as well?
>
> The individual vectors would be used for add, etc. =A0That's what the
> ARRAY_REF stuff above is supposed to be getting at. =A0So...
>
>> Thus, will
>>
>> =A0 for (i=3D0; i<N; ++i)
>> =A0 =A0 X[i] =3D Y[i] + Z[i];
>>
>> result in a single add per vector lane load or a single vector lane load
>> for M "unrolled" instances of (small) vector adds? =A0If the latter then
>> we have to think about indexing the vector lanes as well as allowing
>> partial stores (or have a vector-lane construct operation). =A0Represent=
ing
>> vector lanes as automatic memory (with array of vector type) makes
>> things easy, but eventually not very efficient.
>
> ...Ira would know best, but I don't think it would be used for this
> kind of loop. =A0It would be more something like:
>
> =A0 for (i=3D0; i<N; ++i)
> =A0 =A0 X[i] =3D Y[i].red + Y[i].blue + Y[i].green;
>
> (not a realistic example). =A0You'd then have:
>
> =A0 =A0compoundY =3D __builtin_load_lanes (Y);
> =A0 =A0red =3D ARRAY_REF <compoundY, 0>
> =A0 =A0green =3D ARRAY_REF <compoundY, 1>
> =A0 =A0blue =3D ARRAY_REF <compoundY, 2>
> =A0 =A0D1 =3D red + green
> =A0 =A0D2 =3D D1 + blue
> =A0 =A0MEM_REF <X> =3D D2;
>
> My understanding is that'd we never do any operations besides ARRAY_REFs
> on the compound value, and that the individual vectors would be treated
> pretty much like any other.

Ok, I thought it might be used to have a larger vectorization factor for
loads and stores, basically make further unrolling cheaper because you
don't have to duplicate the loads and stores.

>> I had new tree/stmt codes for array loads/stores for middle-end arrays.
>> Eventually the vector lane support can at least walk in the same directi=
on
>> that middle-end arrays would ;)
>
> What's the status of the middle-end array stuff? =A0A quick search
> showed up your paper, but is it still WIP, or has it already gone in?
> (Showing my ignorance of tree-level stuff here. :-)) =A0It does sound
> like it'd be a good fit for these ops.

Well, the work is basically suspended (though a lot of middle-end
surgery that was required went in) - I was stuck on the necessity
to have the Fortran frontend generate these expressions to have
testing on real code (rather than constructing examples from my
lame C frontend + builtins hack).  ISTR porting the patch to tuples,
the current patch seems to have two or three places that adjust
the middle-end in order to allow aggregate typed SSA names.

But as you have partial defs of the vector lane array the simplest
approach is probably to not make them a register.  Be prepared
for some surprises during RTL expansion though ;)

Richard.

> Richard
>