From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26427 invoked by alias); 13 Jun 2012 09:26:50 -0000 Received: (qmail 26408 invoked by uid 22791); 13 Jun 2012 09:26:48 -0000 X-SWARE-Spam-Status: No, hits=-6.0 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,TW_TM,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 13 Jun 2012 09:26:35 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 786E590983; Wed, 13 Jun 2012 11:26:33 +0200 (CEST) Date: Wed, 13 Jun 2012 09:32:00 -0000 From: Richard Guenther To: "William J. Schmidt" Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com Subject: Re: [PATCH, RFC] First cut at using vec_construct for strided loads In-Reply-To: <1339553936.18291.15.camel@gnopaine> Message-ID: References: <1339553936.18291.15.camel@gnopaine> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="168427776-1835978898-1339579593=:29541" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-06/txt/msg00805.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --168427776-1835978898-1339579593=:29541 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Content-length: 8796 On Tue, 12 Jun 2012, William J. Schmidt wrote: > This patch is a follow-up to the discussion generated by > http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html. I've added > vec_construct to the cost model for use in vect_model_load_cost, and > implemented a cost calculation that makes sense to me for PowerPC. I'm > less certain about the default, i386, and spu implementations. I took a > guess at i386 from the discussions we had, and used the same calculation > for the default and for spu. I'm hoping you or others can fill in the > blanks if I guessed badly. >=20 > The i386 cost for vec_construct is different from all the others, which > are parameterized for each processor description. This should probably > be parameterized in some way as well, but thought you'd know better than > I how that should be. Perhaps instead of >=20 > elements / 2 + 1 >=20 > it should be >=20 > (elements / 2) * X + Y >=20 > where X and Y are taken from the processor description, and represent > the cost of a merge and a permute, respectively. Let me know what you > think. Looks good to me with the gcc_asserts removed - TYPE_VECTOR_SUBPARTS might be 1 for V1TImode for example (heh, not that the vectorizer would vectorize to that). But I don't see any possible breakage with elements =3D=3D 1, do you? Target maintainers can improve on the cost calculation if they wish, the default looks sensible to me. Thanks, Richard. > Thanks, > Bill >=20 >=20 > 2012-06-12 Bill Schmidt >=20 > * targhooks.c (default_builtin_vectorized_conversion): Handle > vec_construct, using vectype to base cost on subparts. > * target.h (enum vect_cost_for_stmt): Add vec_construct. > * tree-vect-stmts.c (vect_model_load_cost): Use vec_construct > instead of scalar_to-vec. > * config/spu/spu.c (spu_builtin_vectorization_cost): Handle > vec_construct in same way as default for now. > * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): > Handle vec_construct, including special case for 32-bit loads. >=20=09 >=20 > Index: gcc/targhooks.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/targhooks.c (revision 188482) > +++ gcc/targhooks.c (working copy) > @@ -499,9 +499,11 @@ default_builtin_vectorized_conversion (unsigned in >=20=20 > int > default_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > - tree vectype ATTRIBUTE_UNUSED, > + tree vectype, > int misalign ATTRIBUTE_UNUSED) > { > + unsigned elements; > + > switch (type_of_cost) > { > case scalar_stmt: > @@ -524,6 +526,11 @@ default_builtin_vectorization_cost (enum vect_cost > case cond_branch_taken: > return 3; >=20=20 > + case vec_construct: > + elements =3D TYPE_VECTOR_SUBPARTS (vectype); > + gcc_assert (elements > 1); > + return elements / 2 + 1; > + > default: > gcc_unreachable (); > } > Index: gcc/target.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/target.h (revision 188482) > +++ gcc/target.h (working copy) > @@ -146,7 +146,8 @@ enum vect_cost_for_stmt > cond_branch_not_taken, > cond_branch_taken, > vec_perm, > - vec_promote_demote > + vec_promote_demote, > + vec_construct > }; >=20=20 > /* The target structure. This holds all the backend hooks. */ > Index: gcc/tree-vect-stmts.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/tree-vect-stmts.c (revision 188482) > +++ gcc/tree-vect-stmts.c (working copy) > @@ -1031,11 +1031,13 @@ vect_model_load_cost (stmt_vec_info stmt_info, int > /* The loads themselves. */ > if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) > { > - /* N scalar loads plus gathering them into a vector. > - ??? scalar_to_vec isn't the cost for that. */ > + /* N scalar loads plus gathering them into a vector. */ > + tree vectype =3D STMT_VINFO_VECTYPE (stmt_info); > inside_cost +=3D (vect_get_stmt_cost (scalar_load) * ncopies > - * TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))); > - inside_cost +=3D ncopies * vect_get_stmt_cost (scalar_to_vec); > + * TYPE_VECTOR_SUBPARTS (vectype)); > + inside_cost +=3D ncopies > + * targetm.vectorize.builtin_vectorization_cost (vec_construct, > + vectype, 0); > } > else > vect_get_load_cost (first_dr, ncopies, > Index: gcc/config/spu/spu.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/config/spu/spu.c (revision 188482) > +++ gcc/config/spu/spu.c (working copy) > @@ -6908,9 +6908,11 @@ spu_builtin_mask_for_load (void) > /* Implement targetm.vectorize.builtin_vectorization_cost. */ > static int=20 > spu_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > - tree vectype ATTRIBUTE_UNUSED, > + tree vectype, > int misalign ATTRIBUTE_UNUSED) > { > + unsigned elements; > + > switch (type_of_cost) > { > case scalar_stmt: > @@ -6937,6 +6939,11 @@ spu_builtin_vectorization_cost (enum vect_cost_for > case cond_branch_taken: > return 6; >=20=20 > + case vec_construct: > + elements =3D TYPE_VECTOR_SUBPARTS (vectype); > + gcc_assert (elements > 1); > + return elements / 2 + 1; > + > default: > gcc_unreachable (); > } > Index: gcc/config/i386/i386.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/config/i386/i386.c (revision 188482) > +++ gcc/config/i386/i386.c (working copy) > @@ -36072,9 +36072,11 @@ static const struct attribute_spec ix86_attribut= e_ > /* Implement targetm.vectorize.builtin_vectorization_cost. */ > static int > ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > - tree vectype ATTRIBUTE_UNUSED, > + tree vectype, > int misalign ATTRIBUTE_UNUSED) > { > + unsigned elements; > + > switch (type_of_cost) > { > case scalar_stmt: > @@ -36115,6 +36117,11 @@ ix86_builtin_vectorization_cost (enum vect_cost_= fo > case vec_promote_demote: > return ix86_cost->vec_stmt_cost; >=20=20 > + case vec_construct: > + elements =3D TYPE_VECTOR_SUBPARTS (vectype); > + gcc_assert (elements > 1); > + return elements / 2 + 1; > + > default: > gcc_unreachable (); > } > Index: gcc/config/rs6000/rs6000.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/config/rs6000/rs6000.c (revision 188482) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -3405,6 +3405,7 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > tree vectype, int misalign) > { > unsigned elements; > + tree elem_type; >=20=20 > switch (type_of_cost) > { > @@ -3504,6 +3505,19 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ >=20=20 > return 2; >=20=20 > + case vec_construct: > + elements =3D TYPE_VECTOR_SUBPARTS (vectype); > + elem_type =3D TREE_TYPE (vectype); > + gcc_assert (elements > 1); > + /* 32-bit vectors loaded into registers are stored as double > + precision, so we need n/2 converts in addition to the usual > + n/2 merges to construct a vector of short floats from them. */ > + if (SCALAR_FLOAT_TYPE_P (elem_type) > + && TYPE_PRECISION (elem_type) =3D=3D 32) > + return elements + 1; > + else > + return elements / 2 + 1; > + > default: > gcc_unreachable (); > } >=20 >=20 >=20 --=20 Richard Guenther SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend=C3=B6rffer= --168427776-1835978898-1339579593=:29541--