From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-289772-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 11270 invoked by alias); 18 Apr 2011 12:09:27 -0000
Received: (qmail 11260 invoked by uid 22791); 18 Apr 2011 12:09:26 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST
X-Spam-Check-By: sourceware.org
Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 18 Apr 2011 12:08:52 +0000
Received: by wye20 with SMTP id 20so4648628wye.20        for <gcc-patches@gcc.gnu.org>; Mon, 18 Apr 2011 05:08:50 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.227.39.66 with SMTP id f2mr4880899wbe.2.1303128530788; Mon, 18 Apr 2011 05:08:50 -0700 (PDT)
Received: by 10.227.0.140 with HTTP; Mon, 18 Apr 2011 05:08:50 -0700 (PDT)
In-Reply-To: <g4bp03ajjw.fsf@linaro.org>
References: <g4pqorfvwp.fsf@linaro.org>	<g44o63fu4r.fsf@linaro.org>	<BANLkTi=Q07DdYYiP_oWMbHtVud4z_xr9Bw@mail.gmail.com>	<g4bp03ajjw.fsf@linaro.org>
Date: Mon, 18 Apr 2011 12:54:00 -0000
Message-ID: <BANLkTinE+1mu3NVAEzYDMZ3-uFu-Fv+G4g@mail.gmail.com>
Subject: Re: [5/9] Main target-independent support for direct interleaving
From: Richard Guenther <richard.guenther@gmail.com>
To: Richard Guenther <richard.guenther@gmail.com>, gcc-patches@gcc.gnu.org, patches@linaro.org, 	richard.sandiford@linaro.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-04/txt/msg01376.txt.bz2

On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds vec_load_lanes and vec_store_lanes optabs for instructi=
ons
>>> like NEON's vldN and vstN. =A0The optabs are defined this way because t=
he
>>> vectors must be allocated to a block of consecutive registers.
>>>
>>> Tested on x86_64-linux-gnu and arm-linux-gnueabi. =A0OK to install?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>> =A0 =A0 =A0 =A0* doc/md.texi (vec_load_lanes, vec_store_lanes): Documen=
t.
>>> =A0 =A0 =A0 =A0* optabs.h (COI_vec_load_lanes, COI_vec_store_lanes): New
>>> =A0 =A0 =A0 =A0convert_optab_index values.
>>> =A0 =A0 =A0 =A0(vec_load_lanes_optab, vec_store_lanes_optab): New conve=
rt optabs.
>>> =A0 =A0 =A0 =A0* genopinit.c (optabs): Initialize the new optabs.
>>> =A0 =A0 =A0 =A0* internal-fn.def (LOAD_LANES, STORE_LANES): New interna=
l functions.
>>> =A0 =A0 =A0 =A0* internal-fn.c (get_multi_vector_move, expand_LOAD_LANE=
S)
>>> =A0 =A0 =A0 =A0(expand_STORE_LANES): New functions.
>>> =A0 =A0 =A0 =A0* tree.h (build_simple_array_type): Declare.
>>> =A0 =A0 =A0 =A0* tree.c (build_simple_array_type): New function.
>>> =A0 =A0 =A0 =A0* tree-vectorizer.h (vect_model_store_cost): Add a bool =
argument.
>>> =A0 =A0 =A0 =A0(vect_model_load_cost): Likewise.
>>> =A0 =A0 =A0 =A0(vect_store_lanes_supported, vect_load_lanes_supported)
>>> =A0 =A0 =A0 =A0(vect_record_strided_load_vectors): Declare.
>>> =A0 =A0 =A0 =A0* tree-vect-data-refs.c (vect_lanes_optab_supported_p)
>>> =A0 =A0 =A0 =A0(vect_store_lanes_supported, vect_load_lanes_supported):=
 New functions.
>>> =A0 =A0 =A0 =A0(vect_transform_strided_load): Split out statement recor=
ding into...
>>> =A0 =A0 =A0 =A0(vect_record_strided_load_vectors): ...this new function.
>>> =A0 =A0 =A0 =A0* tree-vect-stmts.c (create_vector_array, read_vector_ar=
ray)
>>> =A0 =A0 =A0 =A0(write_vector_array, create_array_ref): New functions.
>>> =A0 =A0 =A0 =A0(vect_model_store_cost): Add store_lanes_p argument.
>>> =A0 =A0 =A0 =A0(vect_model_load_cost): Add load_lanes_p argument.
>>> =A0 =A0 =A0 =A0(vectorizable_store): Try to use store-lanes functions f=
or
>>> =A0 =A0 =A0 =A0interleaved stores.
>>> =A0 =A0 =A0 =A0(vectorizable_load): Likewise load-lanes and loads.
>>> =A0 =A0 =A0 =A0* tree-vect-slp.c (vect_get_and_check_slp_defs)
>>> =A0 =A0 =A0 =A0(vect_build_slp_tree):
>>>
>>> Index: gcc/doc/md.texi
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> --- gcc/doc/md.texi =A0 =A0 2011-04-12 12:16:46.000000000 +0100
>>> +++ gcc/doc/md.texi =A0 =A0 2011-04-12 14:48:28.000000000 +0100
>>> @@ -3846,6 +3846,48 @@ into consecutive memory locations. =A0Oper
>>> =A0consecutive memory locations, operand 1 is the first register, and
>>> =A0operand 2 is a constant: the number of consecutive registers.
>>>
>>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
>>> +@item @samp{vec_load_lanes@var{m}@var{n}}
>>> +Perform an interleaved load of several vectors from memory operand 1
>>> +into register operand 0. =A0Both operands have mode @var{m}. =A0The re=
gister
>>> +operand is viewed as holding consecutive vectors of mode @var{n},
>>> +while the memory operand is a flat array that contains the same number
>>> +of elements. =A0The operation is equivalent to:
>>> +
>>> +@smallexample
>>> +int c =3D GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
>>> +for (j =3D 0; j < GET_MODE_NUNITS (@var{n}); j++)
>>> + =A0for (i =3D 0; i < c; i++)
>>> + =A0 =A0operand0[i][j] =3D operand1[j * c + i];
>>> +@end smallexample
>>> +
>>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
>>> +from memory into a register of mode @samp{TI}@. =A0The register
>>> +contains two consecutive vectors of mode @samp{V4HI}@.
>>
>> So vec_load_lanestiv2qi would load ... ? =A0c =3D=3D 8 here. =A0Intuitiv=
ely
>> such operation would have adjacent blocks of siv2qi memory. =A0But
>> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
>> * GET_MODE_NUNITS (@var{n})? =A0In which case the mode m is
>> redundant? =A0You could specify that we load NUNITS adjacent vectors into
>> an integer mode of appropriate size.
>
> Like you say, vec_load_lanestiv2qi would load 16 QImode elements into
> 8 consecutive V2QI registers. =A0The first element from register vector I
> would come from operand1[I] and the second element would come from
> operand1[I + 8]. =A0That's meant to be a valid combination.

Ok, but the C loop from the example doesn't seem to match.  Or I couldn't
wrap my head around it despite looking for 5 minutes and already having
coffee ;)  I would have expected the vectors being in memory as

  v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ...

not

  v0[0], v1[0], v2[0], ...

as I would have thought the former is more useful (simple unrolling for
stride 2).  We'd need a separate set of optabs for such an interleaving
scheme?  In which case we might want to come up with a more
specific name than load_lane?

> We specifically want to allow:
>
> =A0GET_MODE_SIZE (@var{m})
> =A0 =A0!=3D GET_MODE_SIZE (@var{n}) * GET_MODE_NUNITS (@var{n})
>
> The vec_load_lanestiv4hi example in the docs is one case of this:
>
> =A0GET_MODE_SIZE (@var{m}) =3D 16
> =A0GET_MODE_SIZE (@var{n}) =3D 8
> =A0GET_MODE_NUNITS (@var{n}) =3D 4
>
> That example maps directly to ARM's vld2.32. =A0We also want cases
> where @var{m} is three times the size of @var{n} (vld3.WW) and
> cases where @var{m} is four times the size of @var{n} (vld4.WW)
>
>>> +/* Return a representation of ELT_TYPE[NELTS], using indices of type
>>> + =A0 sizetype. =A0*/
>>> +
>>> +tree
>>> +build_simple_array_type (tree elt_type, unsigned HOST_WIDE_INT nelts)
>>
>> build_array_type_nelts
>
> OK.
>
> Richard
>