From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22645 invoked by alias); 22 Mar 2011 19:43:34 -0000 Received: (qmail 22631 invoked by uid 22791); 22 Mar 2011 19:43:32 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST X-Spam-Check-By: sourceware.org Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 Mar 2011 19:43:21 +0000 Received: by wwj40 with SMTP id 40so8294009wwj.8 for ; Tue, 22 Mar 2011 12:43:19 -0700 (PDT) Received: by 10.227.162.77 with SMTP id u13mr5724278wbx.187.1300822998755; Tue, 22 Mar 2011 12:43:18 -0700 (PDT) Received: from localhost (rsandifo.gotadsl.co.uk [82.133.89.107]) by mx.google.com with ESMTPS id e13sm3440141wbi.57.2011.03.22.12.43.15 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 22 Mar 2011 12:43:16 -0700 (PDT) From: Richard Sandiford To: Richard Guenther Mail-Followup-To: Richard Guenther ,gcc@gcc.gnu.org, rdsandiford@googlemail.com Cc: gcc@gcc.gnu.org Subject: Re: RFC: Representing vector lane load/store operations References: Date: Tue, 22 Mar 2011 19:43:00 -0000 In-Reply-To: (Richard Guenther's message of "Tue, 22 Mar 2011 18:10:19 +0100") Message-ID: <87k4frlz5c.fsf@firetop.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-03/txt/msg00325.txt.bz2 Richard Guenther writes: > Simple. Just make them registers anyway (I did that in the past > when working on middle-end arrays). You'd set DECL_GIMPLE_REG_P > on the decl. OK, thanks, I'll give that a go. TBH, I'm still hopeful we can do without it, because we do seem to cope quite well as things stand. But I suppose that might not hold true as the examples get more complicated. > 4. a vector-of-vectors type > > Cons > * I don't think we want that ;) Yeah :-) >> =C2=A0 =C2=A0__builtin_load_lanes (REF : array N*M of X) >> =C2=A0 =C2=A0 =C2=A0returns array N of vector M of X >> =C2=A0 =C2=A0 =C2=A0maps to vldN on ARM >> =C2=A0 =C2=A0 =C2=A0in practice, the result would be used in assignments= of the form: >> =C2=A0 =C2=A0 =C2=A0 =C2=A0vectorY =3D ARRAY_REF >> >> =C2=A0 =C2=A0__builtin_store_lanes (VECTORS : array N of vector M of X) >> =C2=A0 =C2=A0 =C2=A0returns array N*M of X >> =C2=A0 =C2=A0 =C2=A0maps to vstN on ARM >> =C2=A0 =C2=A0 =C2=A0in practice, the argument would be populated by assi= gnments of the form: >> =C2=A0 =C2=A0 =C2=A0 =C2=A0ARRAY_REF =3D vectorY >> >> =C2=A0 =C2=A0__builtin_load_lane (REF : array N of X, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 VECTORS : array N of vector M of X, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 LANE : integer) >> =C2=A0 =C2=A0 =C2=A0returns array N of vector M of X >> =C2=A0 =C2=A0 =C2=A0maps to vldN_lane on ARM >> >> =C2=A0 =C2=A0__builtin_store_lane (VECTORS : array N of vector M of X, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0LANE : integer) >> =C2=A0 =C2=A0 =C2=A0returns array N of X >> =C2=A0 =C2=A0 =C2=A0maps to vstN_lane on ARM >> >> =C2=A0 =C2=A0__builtin_load_dup (REF : array N of X) >> =C2=A0 =C2=A0 =C2=A0returns array N of vector M of X >> =C2=A0 =C2=A0 =C2=A0maps to vldN_dup on ARM >> >> I've hacked up a prototype of this and it seems to produce good code. >> What do you think? > > How do you expect these to be used? That is, would you ever expect > components of those large vectors/arrays be used in operations > like add, or does the HW provide vector-lane variants for those as well? The individual vectors would be used for add, etc. That's what the ARRAY_REF stuff above is supposed to be getting at. So... > Thus, will > > for (i=3D0; i X[i] =3D Y[i] + Z[i]; > > result in a single add per vector lane load or a single vector lane load > for M "unrolled" instances of (small) vector adds? If the latter then > we have to think about indexing the vector lanes as well as allowing > partial stores (or have a vector-lane construct operation). Representing > vector lanes as automatic memory (with array of vector type) makes > things easy, but eventually not very efficient. ...Ira would know best, but I don't think it would be used for this kind of loop. It would be more something like: for (i=3D0; i green =3D ARRAY_REF blue =3D ARRAY_REF D1 =3D red + green D2 =3D D1 + blue MEM_REF =3D D2; My understanding is that'd we never do any operations besides ARRAY_REFs on the compound value, and that the individual vectors would be treated pretty much like any other. > I had new tree/stmt codes for array loads/stores for middle-end arrays. > Eventually the vector lane support can at least walk in the same direction > that middle-end arrays would ;) What's the status of the middle-end array stuff? A quick search showed up your paper, but is it still WIP, or has it already gone in? (Showing my ignorance of tree-level stuff here. :-)) It does sound like it'd be a good fit for these ops. Richard