From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-289773-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16704 invoked by alias); 18 Apr 2011 12:20:20 -0000
Received: (qmail 16689 invoked by uid 22791); 18 Apr 2011 12:20:18 -0000
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com) (74.125.82.51)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 18 Apr 2011 12:19:33 +0000
Received: by wwf26 with SMTP id 26so5275575wwf.8        for <gcc-patches@gcc.gnu.org>; Mon, 18 Apr 2011 05:19:32 -0700 (PDT)
Received: by 10.216.144.28 with SMTP id m28mr10265758wej.77.1303129172233;        Mon, 18 Apr 2011 05:19:32 -0700 (PDT)
Received: from richards-thinkpad (gbibp9ph1--blueice2n1.emea.ibm.com [195.212.29.75])        by mx.google.com with ESMTPS id u9sm3229859wbg.0.2011.04.18.05.19.30        (version=TLSv1/SSLv3 cipher=OTHER);        Mon, 18 Apr 2011 05:19:31 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@linaro.org>
To: Richard Guenther <richard.guenther@gmail.com>
Mail-Followup-To: Richard Guenther <richard.guenther@gmail.com>,gcc-patches@gcc.gnu.org,  patches@linaro.org, richard.sandiford@linaro.org
Cc: gcc-patches@gcc.gnu.org,  patches@linaro.org
Subject: Re: [5/9] Main target-independent support for direct interleaving
References: <g4pqorfvwp.fsf@linaro.org> <g44o63fu4r.fsf@linaro.org>	<BANLkTi=Q07DdYYiP_oWMbHtVud4z_xr9Bw@mail.gmail.com>	<g4bp03ajjw.fsf@linaro.org>	<BANLkTinE+1mu3NVAEzYDMZ3-uFu-Fv+G4g@mail.gmail.com>
Date: Mon, 18 Apr 2011 12:58:00 -0000
In-Reply-To: <BANLkTinE+1mu3NVAEzYDMZ3-uFu-Fv+G4g@mail.gmail.com> (Richard	Guenther's message of "Mon, 18 Apr 2011 14:08:50 +0200")
Message-ID: <g47harah0u.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-04/txt/msg01377.txt.bz2

Richard Guenther <richard.guenther@gmail.com> writes:
> On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Guenther <richard.guenther@gmail.com> writes:
>>> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Index: gcc/doc/md.texi
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> --- gcc/doc/md.texi =C2=A0 =C2=A0 2011-04-12 12:16:46.000000000 +0100
>>>> +++ gcc/doc/md.texi =C2=A0 =C2=A0 2011-04-12 14:48:28.000000000 +0100
>>>> @@ -3846,6 +3846,48 @@ into consecutive memory locations. =C2=A0Oper
>>>> =C2=A0consecutive memory locations, operand 1 is the first register, a=
nd
>>>> =C2=A0operand 2 is a constant: the number of consecutive registers.
>>>>
>>>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
>>>> +@item @samp{vec_load_lanes@var{m}@var{n}}
>>>> +Perform an interleaved load of several vectors from memory operand 1
>>>> +into register operand 0. =C2=A0Both operands have mode @var{m}. =C2=
=A0The register
>>>> +operand is viewed as holding consecutive vectors of mode @var{n},
>>>> +while the memory operand is a flat array that contains the same number
>>>> +of elements. =C2=A0The operation is equivalent to:
>>>> +
>>>> +@smallexample
>>>> +int c =3D GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
>>>> +for (j =3D 0; j < GET_MODE_NUNITS (@var{n}); j++)
>>>> + =C2=A0for (i =3D 0; i < c; i++)
>>>> + =C2=A0 =C2=A0operand0[i][j] =3D operand1[j * c + i];
>>>> +@end smallexample
>>>> +
>>>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
>>>> +from memory into a register of mode @samp{TI}@. =C2=A0The register
>>>> +contains two consecutive vectors of mode @samp{V4HI}@.
>>>
>>> So vec_load_lanestiv2qi would load ... ? =C2=A0c =3D=3D 8 here. =C2=A0I=
ntuitively
>>> such operation would have adjacent blocks of siv2qi memory. =C2=A0But
>>> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
>>> * GET_MODE_NUNITS (@var{n})? =C2=A0In which case the mode m is
>>> redundant? =C2=A0You could specify that we load NUNITS adjacent vectors=
 into
>>> an integer mode of appropriate size.
>>
>> Like you say, vec_load_lanestiv2qi would load 16 QImode elements into
>> 8 consecutive V2QI registers. =C2=A0The first element from register vect=
or I
>> would come from operand1[I] and the second element would come from
>> operand1[I + 8]. =C2=A0That's meant to be a valid combination.
>
> Ok, but the C loop from the example doesn't seem to match.  Or I couldn't
> wrap my head around it despite looking for 5 minutes and already having
> coffee ;)  I would have expected the vectors being in memory as
>
>   v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ...
>
> not
>
>   v0[0], v1[0], v2[0], ...
>
> as I would have thought the former is more useful (simple unrolling for
> stride 2).

The second one's right.  All lane 0 elements, followed by all lane 1
elements, etc.  I think that's what the C loop says.

> We'd need a separate set of optabs for such an interleaving
> scheme?  In which case we might want to come up with a more
> specific name than load_lane?

Yeah, if someone has a single instruction that does your first example,
then it would need a new optab.  The individual vector pairs could be
represented using the current optab though, if each pair needs a
separate instruction.  E.g. with your v2qi example, vec_load_lanessiv2qi
would load:

   v0[0], v1[0], v0[1], v1[1]

and you could repeat for the others.  So load_lanes (as defined here)
could be treated as a primitive, and your first example could be something
like "repeat_load_lanes".

If you don't like the name "load_lanes" though, I'm happy to use
something else.

Richard