From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12187 invoked by alias); 23 Mar 2011 09:23:22 -0000 Received: (qmail 12162 invoked by uid 22791); 23 Mar 2011 09:23:19 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 23 Mar 2011 09:23:13 +0000 Received: by wwj40 with SMTP id 40so8878635wwj.8 for ; Wed, 23 Mar 2011 02:23:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.195.76 with SMTP id eb12mr6206098wbb.160.1300872191102; Wed, 23 Mar 2011 02:23:11 -0700 (PDT) Received: by 10.227.64.142 with HTTP; Wed, 23 Mar 2011 02:23:11 -0700 (PDT) In-Reply-To: <87k4frlz5c.fsf@firetop.home> References: <87k4frlz5c.fsf@firetop.home> Date: Wed, 23 Mar 2011 09:23:00 -0000 Message-ID: Subject: Re: RFC: Representing vector lane load/store operations From: Richard Guenther To: Richard Guenther , gcc@gcc.gnu.org, rdsandiford@googlemail.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-03/txt/msg00337.txt.bz2 On Tue, Mar 22, 2011 at 8:43 PM, Richard Sandiford wrote: > Richard Guenther writes: >> Simple. =A0Just make them registers anyway (I did that in the past >> when working on middle-end arrays). =A0You'd set DECL_GIMPLE_REG_P >> on the decl. > > OK, thanks, I'll give that a go. =A0TBH, I'm still hopeful we can > do without it, because we do seem to cope quite well as things stand. > But I suppose that might not hold true as the examples get more complicat= ed. > >> =A0 4. a vector-of-vectors type >> >> =A0 =A0 =A0Cons >> =A0 =A0 =A0 =A0 * I don't think we want that ;) > > Yeah :-) > >>> =A0 =A0__builtin_load_lanes (REF : array N*M of X) >>> =A0 =A0 =A0returns array N of vector M of X >>> =A0 =A0 =A0maps to vldN on ARM >>> =A0 =A0 =A0in practice, the result would be used in assignments of the = form: >>> =A0 =A0 =A0 =A0vectorY =3D ARRAY_REF >>> >>> =A0 =A0__builtin_store_lanes (VECTORS : array N of vector M of X) >>> =A0 =A0 =A0returns array N*M of X >>> =A0 =A0 =A0maps to vstN on ARM >>> =A0 =A0 =A0in practice, the argument would be populated by assignments = of the form: >>> =A0 =A0 =A0 =A0ARRAY_REF =3D vectorY >>> >>> =A0 =A0__builtin_load_lane (REF : array N of X, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 VECTORS : array N of ve= ctor M of X, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LANE : integer) >>> =A0 =A0 =A0returns array N of vector M of X >>> =A0 =A0 =A0maps to vldN_lane on ARM >>> >>> =A0 =A0__builtin_store_lane (VECTORS : array N of vector M of X, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LANE : integer) >>> =A0 =A0 =A0returns array N of X >>> =A0 =A0 =A0maps to vstN_lane on ARM >>> >>> =A0 =A0__builtin_load_dup (REF : array N of X) >>> =A0 =A0 =A0returns array N of vector M of X >>> =A0 =A0 =A0maps to vldN_dup on ARM >>> >>> I've hacked up a prototype of this and it seems to produce good code. >>> What do you think? >> >> How do you expect these to be used? =A0That is, would you ever expect >> components of those large vectors/arrays be used in operations >> like add, or does the HW provide vector-lane variants for those as well? > > The individual vectors would be used for add, etc. =A0That's what the > ARRAY_REF stuff above is supposed to be getting at. =A0So... > >> Thus, will >> >> =A0 for (i=3D0; i> =A0 =A0 X[i] =3D Y[i] + Z[i]; >> >> result in a single add per vector lane load or a single vector lane load >> for M "unrolled" instances of (small) vector adds? =A0If the latter then >> we have to think about indexing the vector lanes as well as allowing >> partial stores (or have a vector-lane construct operation). =A0Represent= ing >> vector lanes as automatic memory (with array of vector type) makes >> things easy, but eventually not very efficient. > > ...Ira would know best, but I don't think it would be used for this > kind of loop. =A0It would be more something like: > > =A0 for (i=3D0; i =A0 =A0 X[i] =3D Y[i].red + Y[i].blue + Y[i].green; > > (not a realistic example). =A0You'd then have: > > =A0 =A0compoundY =3D __builtin_load_lanes (Y); > =A0 =A0red =3D ARRAY_REF > =A0 =A0green =3D ARRAY_REF > =A0 =A0blue =3D ARRAY_REF > =A0 =A0D1 =3D red + green > =A0 =A0D2 =3D D1 + blue > =A0 =A0MEM_REF =3D D2; > > My understanding is that'd we never do any operations besides ARRAY_REFs > on the compound value, and that the individual vectors would be treated > pretty much like any other. Ok, I thought it might be used to have a larger vectorization factor for loads and stores, basically make further unrolling cheaper because you don't have to duplicate the loads and stores. >> I had new tree/stmt codes for array loads/stores for middle-end arrays. >> Eventually the vector lane support can at least walk in the same directi= on >> that middle-end arrays would ;) > > What's the status of the middle-end array stuff? =A0A quick search > showed up your paper, but is it still WIP, or has it already gone in? > (Showing my ignorance of tree-level stuff here. :-)) =A0It does sound > like it'd be a good fit for these ops. Well, the work is basically suspended (though a lot of middle-end surgery that was required went in) - I was stuck on the necessity to have the Fortran frontend generate these expressions to have testing on real code (rather than constructing examples from my lame C frontend + builtins hack). ISTR porting the patch to tuples, the current patch seems to have two or three places that adjust the middle-end in order to allow aggregate typed SSA names. But as you have partial defs of the vector lane array the simplest approach is probably to not make them a register. Be prepared for some surprises during RTL expansion though ;) Richard. > Richard >