From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11270 invoked by alias); 18 Apr 2011 12:09:27 -0000 Received: (qmail 11260 invoked by uid 22791); 18 Apr 2011 12:09:26 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 18 Apr 2011 12:08:52 +0000 Received: by wye20 with SMTP id 20so4648628wye.20 for ; Mon, 18 Apr 2011 05:08:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.39.66 with SMTP id f2mr4880899wbe.2.1303128530788; Mon, 18 Apr 2011 05:08:50 -0700 (PDT) Received: by 10.227.0.140 with HTTP; Mon, 18 Apr 2011 05:08:50 -0700 (PDT) In-Reply-To: References: Date: Mon, 18 Apr 2011 12:54:00 -0000 Message-ID: Subject: Re: [5/9] Main target-independent support for direct interleaving From: Richard Guenther To: Richard Guenther , gcc-patches@gcc.gnu.org, patches@linaro.org, richard.sandiford@linaro.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg01376.txt.bz2 On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford wrote: > Richard Guenther writes: >> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford >> wrote: >>> This patch adds vec_load_lanes and vec_store_lanes optabs for instructi= ons >>> like NEON's vldN and vstN. =A0The optabs are defined this way because t= he >>> vectors must be allocated to a block of consecutive registers. >>> >>> Tested on x86_64-linux-gnu and arm-linux-gnueabi. =A0OK to install? >>> >>> Richard >>> >>> >>> gcc/ >>> =A0 =A0 =A0 =A0* doc/md.texi (vec_load_lanes, vec_store_lanes): Documen= t. >>> =A0 =A0 =A0 =A0* optabs.h (COI_vec_load_lanes, COI_vec_store_lanes): New >>> =A0 =A0 =A0 =A0convert_optab_index values. >>> =A0 =A0 =A0 =A0(vec_load_lanes_optab, vec_store_lanes_optab): New conve= rt optabs. >>> =A0 =A0 =A0 =A0* genopinit.c (optabs): Initialize the new optabs. >>> =A0 =A0 =A0 =A0* internal-fn.def (LOAD_LANES, STORE_LANES): New interna= l functions. >>> =A0 =A0 =A0 =A0* internal-fn.c (get_multi_vector_move, expand_LOAD_LANE= S) >>> =A0 =A0 =A0 =A0(expand_STORE_LANES): New functions. >>> =A0 =A0 =A0 =A0* tree.h (build_simple_array_type): Declare. >>> =A0 =A0 =A0 =A0* tree.c (build_simple_array_type): New function. >>> =A0 =A0 =A0 =A0* tree-vectorizer.h (vect_model_store_cost): Add a bool = argument. >>> =A0 =A0 =A0 =A0(vect_model_load_cost): Likewise. >>> =A0 =A0 =A0 =A0(vect_store_lanes_supported, vect_load_lanes_supported) >>> =A0 =A0 =A0 =A0(vect_record_strided_load_vectors): Declare. >>> =A0 =A0 =A0 =A0* tree-vect-data-refs.c (vect_lanes_optab_supported_p) >>> =A0 =A0 =A0 =A0(vect_store_lanes_supported, vect_load_lanes_supported):= New functions. >>> =A0 =A0 =A0 =A0(vect_transform_strided_load): Split out statement recor= ding into... >>> =A0 =A0 =A0 =A0(vect_record_strided_load_vectors): ...this new function. >>> =A0 =A0 =A0 =A0* tree-vect-stmts.c (create_vector_array, read_vector_ar= ray) >>> =A0 =A0 =A0 =A0(write_vector_array, create_array_ref): New functions. >>> =A0 =A0 =A0 =A0(vect_model_store_cost): Add store_lanes_p argument. >>> =A0 =A0 =A0 =A0(vect_model_load_cost): Add load_lanes_p argument. >>> =A0 =A0 =A0 =A0(vectorizable_store): Try to use store-lanes functions f= or >>> =A0 =A0 =A0 =A0interleaved stores. >>> =A0 =A0 =A0 =A0(vectorizable_load): Likewise load-lanes and loads. >>> =A0 =A0 =A0 =A0* tree-vect-slp.c (vect_get_and_check_slp_defs) >>> =A0 =A0 =A0 =A0(vect_build_slp_tree): >>> >>> Index: gcc/doc/md.texi >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> --- gcc/doc/md.texi =A0 =A0 2011-04-12 12:16:46.000000000 +0100 >>> +++ gcc/doc/md.texi =A0 =A0 2011-04-12 14:48:28.000000000 +0100 >>> @@ -3846,6 +3846,48 @@ into consecutive memory locations. =A0Oper >>> =A0consecutive memory locations, operand 1 is the first register, and >>> =A0operand 2 is a constant: the number of consecutive registers. >>> >>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern >>> +@item @samp{vec_load_lanes@var{m}@var{n}} >>> +Perform an interleaved load of several vectors from memory operand 1 >>> +into register operand 0. =A0Both operands have mode @var{m}. =A0The re= gister >>> +operand is viewed as holding consecutive vectors of mode @var{n}, >>> +while the memory operand is a flat array that contains the same number >>> +of elements. =A0The operation is equivalent to: >>> + >>> +@smallexample >>> +int c =3D GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); >>> +for (j =3D 0; j < GET_MODE_NUNITS (@var{n}); j++) >>> + =A0for (i =3D 0; i < c; i++) >>> + =A0 =A0operand0[i][j] =3D operand1[j * c + i]; >>> +@end smallexample >>> + >>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values >>> +from memory into a register of mode @samp{TI}@. =A0The register >>> +contains two consecutive vectors of mode @samp{V4HI}@. >> >> So vec_load_lanestiv2qi would load ... ? =A0c =3D=3D 8 here. =A0Intuitiv= ely >> such operation would have adjacent blocks of siv2qi memory. =A0But >> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n}) >> * GET_MODE_NUNITS (@var{n})? =A0In which case the mode m is >> redundant? =A0You could specify that we load NUNITS adjacent vectors into >> an integer mode of appropriate size. > > Like you say, vec_load_lanestiv2qi would load 16 QImode elements into > 8 consecutive V2QI registers. =A0The first element from register vector I > would come from operand1[I] and the second element would come from > operand1[I + 8]. =A0That's meant to be a valid combination. Ok, but the C loop from the example doesn't seem to match. Or I couldn't wrap my head around it despite looking for 5 minutes and already having coffee ;) I would have expected the vectors being in memory as v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ... not v0[0], v1[0], v2[0], ... as I would have thought the former is more useful (simple unrolling for stride 2). We'd need a separate set of optabs for such an interleaving scheme? In which case we might want to come up with a more specific name than load_lane? > We specifically want to allow: > > =A0GET_MODE_SIZE (@var{m}) > =A0 =A0!=3D GET_MODE_SIZE (@var{n}) * GET_MODE_NUNITS (@var{n}) > > The vec_load_lanestiv4hi example in the docs is one case of this: > > =A0GET_MODE_SIZE (@var{m}) =3D 16 > =A0GET_MODE_SIZE (@var{n}) =3D 8 > =A0GET_MODE_NUNITS (@var{n}) =3D 4 > > That example maps directly to ARM's vld2.32. =A0We also want cases > where @var{m} is three times the size of @var{n} (vld3.WW) and > cases where @var{m} is four times the size of @var{n} (vld4.WW) > >>> +/* Return a representation of ELT_TYPE[NELTS], using indices of type >>> + =A0 sizetype. =A0*/ >>> + >>> +tree >>> +build_simple_array_type (tree elt_type, unsigned HOST_WIDE_INT nelts) >> >> build_array_type_nelts > > OK. > > Richard >