From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4775 invoked by alias); 23 Mar 2011 10:02:09 -0000 Received: (qmail 4741 invoked by uid 22791); 23 Mar 2011 10:02:06 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-yi0-f47.google.com (HELO mail-yi0-f47.google.com) (209.85.218.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 23 Mar 2011 10:02:01 +0000 Received: by yia13 with SMTP id 13so4419890yia.20 for ; Wed, 23 Mar 2011 03:01:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.131.15 with SMTP id i15mr5950049ybn.386.1300874519504; Wed, 23 Mar 2011 03:01:59 -0700 (PDT) Received: by 10.150.92.11 with HTTP; Wed, 23 Mar 2011 03:01:59 -0700 (PDT) In-Reply-To: References: Date: Wed, 23 Mar 2011 10:02:00 -0000 Message-ID: Subject: Re: Fw: RFC: Representing vector lane load/store operations From: Ira Rosen To: Richard Guenther Cc: gcc@gcc.gnu.org, rdsandiford@googlemail.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-03/txt/msg00341.txt.bz2 >> ...Ira would know best, but I don't think it would be used for this >> kind of loop. =A0It would be more something like: >> >> =A0 for (i=3D0; i> =A0 =A0 X[i] =3D Y[i].red + Y[i].blue + Y[i].green; >> >> (not a realistic example). =A0You'd then have: >> >> =A0 =A0compoundY =3D __builtin_load_lanes (Y); >> =A0 =A0red =3D ARRAY_REF >> =A0 =A0green =3D ARRAY_REF >> =A0 =A0blue =3D ARRAY_REF >> =A0 =A0D1 =3D red + green >> =A0 =A0D2 =3D D1 + blue >> =A0 =A0MEM_REF =3D D2; >> >> My understanding is that'd we never do any operations besides ARRAY_REFs >> on the compound value, and that the individual vectors would be treated >> pretty much like any other. > > Ok, I thought it might be used to have a larger vectorization factor for > loads and stores, basically make further unrolling cheaper because you > don't have to duplicate the loads and stores. Right, we can do that using vld1/vst1 instructions (full load/store with N=3D1) and operate on up to 4 doubleword vectors in parallel. But at the moment we are concentrating on efficient support of strided memory accesses. Ira