From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16704 invoked by alias); 18 Apr 2011 12:20:20 -0000 Received: (qmail 16689 invoked by uid 22791); 18 Apr 2011 12:20:18 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 18 Apr 2011 12:19:33 +0000 Received: by wwf26 with SMTP id 26so5275575wwf.8 for ; Mon, 18 Apr 2011 05:19:32 -0700 (PDT) Received: by 10.216.144.28 with SMTP id m28mr10265758wej.77.1303129172233; Mon, 18 Apr 2011 05:19:32 -0700 (PDT) Received: from richards-thinkpad (gbibp9ph1--blueice2n1.emea.ibm.com [195.212.29.75]) by mx.google.com with ESMTPS id u9sm3229859wbg.0.2011.04.18.05.19.30 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 18 Apr 2011 05:19:31 -0700 (PDT) From: Richard Sandiford To: Richard Guenther Mail-Followup-To: Richard Guenther ,gcc-patches@gcc.gnu.org, patches@linaro.org, richard.sandiford@linaro.org Cc: gcc-patches@gcc.gnu.org, patches@linaro.org Subject: Re: [5/9] Main target-independent support for direct interleaving References: Date: Mon, 18 Apr 2011 12:58:00 -0000 In-Reply-To: (Richard Guenther's message of "Mon, 18 Apr 2011 14:08:50 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg01377.txt.bz2 Richard Guenther writes: > On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford > wrote: >> Richard Guenther writes: >>> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford >>> wrote: >>>> Index: gcc/doc/md.texi >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> --- gcc/doc/md.texi =C2=A0 =C2=A0 2011-04-12 12:16:46.000000000 +0100 >>>> +++ gcc/doc/md.texi =C2=A0 =C2=A0 2011-04-12 14:48:28.000000000 +0100 >>>> @@ -3846,6 +3846,48 @@ into consecutive memory locations. =C2=A0Oper >>>> =C2=A0consecutive memory locations, operand 1 is the first register, a= nd >>>> =C2=A0operand 2 is a constant: the number of consecutive registers. >>>> >>>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern >>>> +@item @samp{vec_load_lanes@var{m}@var{n}} >>>> +Perform an interleaved load of several vectors from memory operand 1 >>>> +into register operand 0. =C2=A0Both operands have mode @var{m}. =C2= =A0The register >>>> +operand is viewed as holding consecutive vectors of mode @var{n}, >>>> +while the memory operand is a flat array that contains the same number >>>> +of elements. =C2=A0The operation is equivalent to: >>>> + >>>> +@smallexample >>>> +int c =3D GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); >>>> +for (j =3D 0; j < GET_MODE_NUNITS (@var{n}); j++) >>>> + =C2=A0for (i =3D 0; i < c; i++) >>>> + =C2=A0 =C2=A0operand0[i][j] =3D operand1[j * c + i]; >>>> +@end smallexample >>>> + >>>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values >>>> +from memory into a register of mode @samp{TI}@. =C2=A0The register >>>> +contains two consecutive vectors of mode @samp{V4HI}@. >>> >>> So vec_load_lanestiv2qi would load ... ? =C2=A0c =3D=3D 8 here. =C2=A0I= ntuitively >>> such operation would have adjacent blocks of siv2qi memory. =C2=A0But >>> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n}) >>> * GET_MODE_NUNITS (@var{n})? =C2=A0In which case the mode m is >>> redundant? =C2=A0You could specify that we load NUNITS adjacent vectors= into >>> an integer mode of appropriate size. >> >> Like you say, vec_load_lanestiv2qi would load 16 QImode elements into >> 8 consecutive V2QI registers. =C2=A0The first element from register vect= or I >> would come from operand1[I] and the second element would come from >> operand1[I + 8]. =C2=A0That's meant to be a valid combination. > > Ok, but the C loop from the example doesn't seem to match. Or I couldn't > wrap my head around it despite looking for 5 minutes and already having > coffee ;) I would have expected the vectors being in memory as > > v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ... > > not > > v0[0], v1[0], v2[0], ... > > as I would have thought the former is more useful (simple unrolling for > stride 2). The second one's right. All lane 0 elements, followed by all lane 1 elements, etc. I think that's what the C loop says. > We'd need a separate set of optabs for such an interleaving > scheme? In which case we might want to come up with a more > specific name than load_lane? Yeah, if someone has a single instruction that does your first example, then it would need a new optab. The individual vector pairs could be represented using the current optab though, if each pair needs a separate instruction. E.g. with your v2qi example, vec_load_lanessiv2qi would load: v0[0], v1[0], v0[1], v1[1] and you could repeat for the others. So load_lanes (as defined here) could be treated as a primitive, and your first example could be something like "repeat_load_lanes". If you don't like the name "load_lanes" though, I'm happy to use something else. Richard