From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31370 invoked by alias); 12 Apr 2011 13:57:39 -0000 Received: (qmail 31352 invoked by uid 22791); 12 Apr 2011 13:57:36 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_TM,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-ww0-f41.google.com (HELO mail-ww0-f41.google.com) (74.125.82.41) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 12 Apr 2011 13:57:30 +0000 Received: by wwi18 with SMTP id 18so3848211wwi.2 for ; Tue, 12 Apr 2011 06:57:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.165.194 with SMTP id j2mr1062184wby.178.1302616648566; Tue, 12 Apr 2011 06:57:28 -0700 (PDT) Received: by 10.227.0.140 with HTTP; Tue, 12 Apr 2011 06:57:28 -0700 (PDT) In-Reply-To: References: Date: Tue, 12 Apr 2011 13:57:00 -0000 Message-ID: Subject: Re: [4/9] Move power-of-two checks for interleaving From: Richard Guenther To: gcc-patches@gcc.gnu.org, patches@linaro.org, richard.sandiford@linaro.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg00880.txt.bz2 On Tue, Apr 12, 2011 at 3:44 PM, Richard Sandiford wrote: > NEON has vld3 and vst3 instructions, which support an interleaving of > three vectors. =A0This patch therefore removes the blanket power-of-two > requirement for interleaving and enforces it on a per-operation > basis instead. > > The patch also replaces: > > =A0/* Check that the operation is supported. =A0*/ > =A0if (!vect_strided_store_supported (vectype)) > =A0 =A0return false; > > with: > > =A0gcc_assert (vect_strided_store_supported (vectype, length)); > > because it was vectorizable_store's responsibility to check this upfront. > Likewise for loads. > > Tested on x86_64-linux-gnu and arm-linux-gnueabi. =A0OK to install? Ok. Thanks, Richard. > Richard > > > gcc/ > =A0 =A0 =A0 =A0* tree-vectorizer.h (vect_strided_store_supported): Add a > =A0 =A0 =A0 =A0HOST_WIDE_INT argument. > =A0 =A0 =A0 =A0(vect_strided_load_supported): Likewise. > =A0 =A0 =A0 =A0(vect_permute_store_chain): Return void. > =A0 =A0 =A0 =A0(vect_transform_strided_load): Likewise. > =A0 =A0 =A0 =A0(vect_permute_load_chain): Delete. > =A0 =A0 =A0 =A0* tree-vect-data-refs.c (vect_strided_store_supported): Ta= ke a > =A0 =A0 =A0 =A0count argument. =A0Check that the count is a power of two. > =A0 =A0 =A0 =A0(vect_strided_load_supported): Likewise. > =A0 =A0 =A0 =A0(vect_permute_store_chain): Return void. =A0Update after a= bove changes. > =A0 =A0 =A0 =A0Assert that the access is supported. > =A0 =A0 =A0 =A0(vect_permute_load_chain): Likewise. > =A0 =A0 =A0 =A0(vect_transform_strided_load): Return void. > =A0 =A0 =A0 =A0* tree-vect-stmts.c (vectorizable_store): Update calls aft= er > =A0 =A0 =A0 =A0above interface changes. > =A0 =A0 =A0 =A0(vectorizable_load): Likewise. > =A0 =A0 =A0 =A0(vect_analyze_stmt): Don't check for strided powers of two= here. > > Index: gcc/tree-vectorizer.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/tree-vectorizer.h =A0 =A0 =A0 2011-04-12 11:55:07.000000000 +0100 > +++ gcc/tree-vectorizer.h =A0 =A0 =A0 2011-04-12 11:55:09.000000000 +0100 > @@ -828,16 +828,14 @@ extern tree vect_create_data_ref_ptr (gi > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0gimple *, bool, bool *); > =A0extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gim= ple, tree); > =A0extern tree vect_create_destination_var (tree, tree); > -extern bool vect_strided_store_supported (tree); > -extern bool vect_strided_load_supported (tree); > -extern bool vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gim= ple, > +extern bool vect_strided_store_supported (tree, unsigned HOST_WIDE_INT); > +extern bool vect_strided_load_supported (tree, unsigned HOST_WIDE_INT); > +extern void vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gim= ple, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 g= imple_stmt_iterator *, VEC(tree,heap) **); > =A0extern tree vect_setup_realignment (gimple, gimple_stmt_iterator *, tr= ee *, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 e= num dr_alignment_support, tree, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 s= truct loop **); > -extern bool vect_permute_load_chain (VEC(tree,heap) *,unsigned int, gimp= le, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= gimple_stmt_iterator *, VEC(tree,heap) **); > -extern bool vect_transform_strided_load (gimple, VEC(tree,heap) *, int, > +extern void vect_transform_strided_load (gimple, VEC(tree,heap) *, int, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0gimple_stmt_iterator *); > =A0extern int vect_get_place_in_interleaving_chain (gimple, gimple); > =A0extern tree vect_get_new_vect_var (tree, enum vect_var_kind, const cha= r *); > Index: gcc/tree-vect-data-refs.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/tree-vect-data-refs.c =A0 2011-04-12 11:55:07.000000000 +0100 > +++ gcc/tree-vect-data-refs.c =A0 2011-04-12 11:55:09.000000000 +0100 > @@ -2196,19 +2196,6 @@ vect_analyze_group_access (struct data_r > =A0 =A0 =A0 =A0 =A0 return false; > =A0 =A0 =A0 =A0 } > > - =A0 =A0 =A0/* FORNOW: we handle only interleaving that is a power of 2. > - =A0 =A0 =A0 =A0 We don't fail here if it may be still possible to vecto= rize the > - =A0 =A0 =A0 =A0 group using SLP. =A0If not, the size of the group will = be checked in > - =A0 =A0 =A0 =A0 vect_analyze_operations, and the vectorization will fai= l. =A0*/ > - =A0 =A0 =A0if (exact_log2 (stride) =3D=3D -1) > - =A0 =A0 =A0 { > - =A0 =A0 =A0 =A0 if (vect_print_dump_info (REPORT_DETAILS)) > - =A0 =A0 =A0 =A0 =A0 fprintf (vect_dump, "interleaving is not a power of= 2"); > - > - =A0 =A0 =A0 =A0 if (slp_impossible) > - =A0 =A0 =A0 =A0 =A0 return false; > - =A0 =A0 =A0 } > - > =A0 =A0 =A0 if (stride =3D=3D 0) > =A0 =A0 =A0 =A0 stride =3D count; > > @@ -3349,13 +3336,22 @@ vect_create_destination_var (tree scalar > =A0 =A0and FALSE otherwise. =A0*/ > > =A0bool > -vect_strided_store_supported (tree vectype) > +vect_strided_store_supported (tree vectype, unsigned HOST_WIDE_INT count) > =A0{ > =A0 optab interleave_high_optab, interleave_low_optab; > =A0 enum machine_mode mode; > > =A0 mode =3D TYPE_MODE (vectype); > > + =A0/* vect_permute_store_chain requires the group size to be a power of= two. =A0*/ > + =A0if (exact_log2 (count) =3D=3D -1) > + =A0 =A0{ > + =A0 =A0 =A0if (vect_print_dump_info (REPORT_DETAILS)) > + =A0 =A0 =A0 fprintf (vect_dump, "the size of the group of strided acces= ses" > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" is not a power of 2"); > + =A0 =A0 =A0return false; > + =A0 =A0} > + > =A0 /* Check that the operation is supported. =A0*/ > =A0 interleave_high_optab =3D optab_for_tree_code (VEC_INTERLEAVE_HIGH_EX= PR, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 vectype, optab_default); > @@ -3441,7 +3437,7 @@ vect_strided_store_supported (tree vecty > =A0 =A0I3: =A04 12 20 28 =A05 13 21 30 > =A0 =A0I4: =A06 14 22 30 =A07 15 23 31. =A0*/ > > -bool > +void > =A0vect_permute_store_chain (VEC(tree,heap) *dr_chain, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0unsigned int length, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0gimple stmt, > @@ -3455,9 +3451,7 @@ vect_permute_store_chain (VEC(tree,heap) > =A0 unsigned int j; > =A0 enum tree_code high_code, low_code; > > - =A0/* Check that the operation is supported. =A0*/ > - =A0if (!vect_strided_store_supported (vectype)) > - =A0 =A0return false; > + =A0gcc_assert (vect_strided_store_supported (vectype, length)); > > =A0 *result_chain =3D VEC_copy (tree, heap, dr_chain); > > @@ -3510,7 +3504,6 @@ vect_permute_store_chain (VEC(tree,heap) > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 dr_chain =3D VEC_copy (tree, heap, *result_chain); > =A0 =A0 } > - =A0return true; > =A0} > > =A0/* Function vect_setup_realignment > @@ -3787,13 +3780,22 @@ vect_setup_realignment (gimple stmt, gim > =A0 =A0and FALSE otherwise. =A0*/ > > =A0bool > -vect_strided_load_supported (tree vectype) > +vect_strided_load_supported (tree vectype, unsigned HOST_WIDE_INT count) > =A0{ > =A0 optab perm_even_optab, perm_odd_optab; > =A0 enum machine_mode mode; > > =A0 mode =3D TYPE_MODE (vectype); > > + =A0/* vect_permute_load_chain requires the group size to be a power of = two. =A0*/ > + =A0if (exact_log2 (count) =3D=3D -1) > + =A0 =A0{ > + =A0 =A0 =A0if (vect_print_dump_info (REPORT_DETAILS)) > + =A0 =A0 =A0 fprintf (vect_dump, "the size of the group of strided acces= ses" > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" is not a power of 2"); > + =A0 =A0 =A0return false; > + =A0 =A0} > + > =A0 perm_even_optab =3D optab_for_tree_code (VEC_EXTRACT_EVEN_EXPR, vecty= pe, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 optab_default); > =A0 if (!perm_even_optab) > @@ -3905,7 +3907,7 @@ vect_strided_load_supported (tree vectyp > =A0 =A03rd vec (E2): =A02 6 10 14 18 22 26 30 > =A0 =A04th vec (E4): =A03 7 11 15 19 23 27 31. =A0*/ > > -bool > +static void > =A0vect_permute_load_chain (VEC(tree,heap) *dr_chain, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 unsigned int length, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gimple stmt, > @@ -3918,9 +3920,7 @@ vect_permute_load_chain (VEC(tree,heap) > =A0 int i; > =A0 unsigned int j; > > - =A0/* Check that the operation is supported. =A0*/ > - =A0if (!vect_strided_load_supported (vectype)) > - =A0 =A0return false; > + =A0gcc_assert (vect_strided_load_supported (vectype, length)); > > =A0 *result_chain =3D VEC_copy (tree, heap, dr_chain); > =A0 for (i =3D 0; i < exact_log2 (length); i++) > @@ -3963,7 +3963,6 @@ vect_permute_load_chain (VEC(tree,heap) > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 dr_chain =3D VEC_copy (tree, heap, *result_chain); > =A0 =A0 } > - =A0return true; > =A0} > > > @@ -3974,7 +3973,7 @@ vect_permute_load_chain (VEC(tree,heap) > =A0 =A0the scalar statements. > =A0*/ > > -bool > +void > =A0vect_transform_strided_load (gimple stmt, VEC(tree,heap) *dr_chain, in= t size, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gimple_stmt_itera= tor *gsi) > =A0{ > @@ -3990,8 +3989,7 @@ vect_transform_strided_load (gimple stmt > =A0 =A0 =A0vectors, that are ready for vector computation. =A0*/ > =A0 result_chain =3D VEC_alloc (tree, heap, size); > =A0 /* Permute. =A0*/ > - =A0if (!vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_cha= in)) > - =A0 =A0return false; > + =A0vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain); > > =A0 /* Put a permuted data-ref in the VECTORIZED_STMT field. > =A0 =A0 =A0Since we scan the chain starting from it's first node, their o= rder > @@ -4055,7 +4053,6 @@ vect_transform_strided_load (gimple stmt > =A0 =A0 } > > =A0 VEC_free (tree, heap, result_chain); > - =A0return true; > =A0} > > =A0/* Function vect_force_dr_alignment_p. > Index: gcc/tree-vect-stmts.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/tree-vect-stmts.c =A0 =A0 =A0 2011-04-12 11:55:09.000000000 +0100 > +++ gcc/tree-vect-stmts.c =A0 =A0 =A0 2011-04-12 11:55:09.000000000 +0100 > @@ -3412,9 +3412,12 @@ vectorizable_store (gimple stmt, gimple_ > =A0 =A0 { > =A0 =A0 =A0 strided_store =3D true; > =A0 =A0 =A0 first_stmt =3D DR_GROUP_FIRST_DR (stmt_info); > - =A0 =A0 =A0if (!vect_strided_store_supported (vectype) > - =A0 =A0 =A0 =A0 && !PURE_SLP_STMT (stmt_info) && !slp) > - =A0 =A0 =A0 return false; > + =A0 =A0 =A0if (!slp && !PURE_SLP_STMT (stmt_info)) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 group_size =3D DR_GROUP_SIZE (vinfo_for_stmt (first_stm= t)); > + =A0 =A0 =A0 =A0 if (!vect_strided_store_supported (vectype, group_size)) > + =A0 =A0 =A0 =A0 =A0 return false; > + =A0 =A0 =A0 } > > =A0 =A0 =A0 if (first_stmt =3D=3D stmt) > =A0 =A0 =A0 =A0{ > @@ -3617,9 +3620,8 @@ vectorizable_store (gimple stmt, gimple_ > =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0result_chain =3D VEC_alloc (tree, heap, group_= size); > =A0 =A0 =A0 =A0 =A0 =A0 =A0/* Permute. =A0*/ > - =A0 =A0 =A0 =A0 =A0 =A0 if (!vect_permute_store_chain (dr_chain, group_= size, stmt, gsi, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0&result_chain)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return false; > + =A0 =A0 =A0 =A0 =A0 =A0 vect_permute_store_chain (dr_chain, group_size,= stmt, gsi, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 &result_chain); > =A0 =A0 =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0 =A0next_stmt =3D first_stmt; > @@ -3912,10 +3914,13 @@ vectorizable_load (gimple stmt, gimple_s > =A0 =A0 =A0 /* FORNOW */ > =A0 =A0 =A0 gcc_assert (! nested_in_vect_loop); > > - =A0 =A0 =A0/* Check if interleaving is supported. =A0*/ > - =A0 =A0 =A0if (!vect_strided_load_supported (vectype) > - =A0 =A0 =A0 =A0 && !PURE_SLP_STMT (stmt_info) && !slp) > - =A0 =A0 =A0 return false; > + =A0 =A0 =A0first_stmt =3D DR_GROUP_FIRST_DR (stmt_info); > + =A0 =A0 =A0if (!slp && !PURE_SLP_STMT (stmt_info)) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 group_size =3D DR_GROUP_SIZE (vinfo_for_stmt (first_stm= t)); > + =A0 =A0 =A0 =A0 if (!vect_strided_load_supported (vectype, group_size)) > + =A0 =A0 =A0 =A0 =A0 return false; > + =A0 =A0 =A0 } > =A0 =A0 } > > =A0 if (negative) > @@ -4344,10 +4349,7 @@ vectorizable_load (gimple stmt, gimple_s > =A0 =A0 =A0 =A0 { > =A0 =A0 =A0 =A0 =A0 if (strided_load) > =A0 =A0 =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 =A0 =A0 if (!vect_transform_strided_load (stmt, dr_chai= n, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 group_size, gsi)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return false; > - > + =A0 =A0 =A0 =A0 =A0 =A0 vect_transform_strided_load (stmt, dr_chain, gr= oup_size, gsi); > =A0 =A0 =A0 =A0 =A0 =A0 =A0*vec_stmt =3D STMT_VINFO_VEC_STMT (stmt_info); > =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 else > @@ -4766,27 +4768,6 @@ vect_analyze_stmt (gimple stmt, bool *ne > =A0 =A0 =A0 =A0return false; > =A0 =A0 } > > - =A0if (!PURE_SLP_STMT (stmt_info)) > - =A0 =A0{ > - =A0 =A0 =A0/* Groups of strided accesses whose size is not a power of 2= are not > - =A0 =A0 =A0 =A0 vectorizable yet using loop-vectorization. =A0Therefore= , if this stmt > - =A0 =A0 =A0 =A0feeds non-SLP-able stmts (i.e., this stmt has to be both= SLPed and > - =A0 =A0 =A0 =A0loop-based vectorized), the loop cannot be vectorized. = =A0*/ > - =A0 =A0 =A0if (STMT_VINFO_STRIDED_ACCESS (stmt_info) > - =A0 =A0 =A0 =A0 =A0&& exact_log2 (DR_GROUP_SIZE (vinfo_for_stmt ( > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0DR_GROUP_FIRST_DR (stmt_info)))) =3D=3D -1) > - =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 =A0if (vect_print_dump_info (REPORT_DETAILS)) > - =A0 =A0 =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0fprintf (vect_dump, "not vectorized: the siz= e of group " > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"of = strided accesses is not a power of 2"); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0print_gimple_stmt (vect_dump, stmt, 0, TDF_S= LIM); > - =A0 =A0 =A0 =A0 =A0 =A0} > - > - =A0 =A0 =A0 =A0 =A0return false; > - =A0 =A0 =A0 =A0} > - =A0 =A0} > - > =A0 return true; > =A0} > >