From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7059 invoked by alias); 8 Jun 2016 14:05:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7046 invoked by uid 89); 8 Jun 2016 14:05:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.2 required=5.0 tests=BAYES_50,KAM_ASCII_DIVIDERS,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=U*richard.guenther, richard.guenther@gmail.com, richardguenthergmailcom, 49911 X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 08 Jun 2016 14:05:19 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id B3977AAB6; Wed, 8 Jun 2016 14:05:15 +0000 (UTC) Date: Wed, 08 Jun 2016 14:05:00 -0000 From: Richard Biener To: Bill Schmidt cc: Uros Bizjak , GCC Patches , Peter Bergner Subject: Re: [PATCH, RFC] First cut at using vec_construct for strided loads In-Reply-To: <24C78655-372F-4FDA-80AA-C1EF9F0DFE97@linux.vnet.ibm.com> Message-ID: References: <1339553936.18291.15.camel@gnopaine> <24C78655-372F-4FDA-80AA-C1EF9F0DFE97@linux.vnet.ibm.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-1609949885-218008479-1465394715=:1493" X-SW-Source: 2016-06/txt/msg00608.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609949885-218008479-1465394715=:1493 Content-Type: TEXT/PLAIN; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-length: 11849 On Wed, 8 Jun 2016, Bill Schmidt wrote: > Hi Richard, > > > On Jun 8, 2016, at 7:29 AM, Richard Biener wrote: > > > > On Wed, Jun 13, 2012 at 4:18 AM, William J. Schmidt > > wrote: > >> This patch is a follow-up to the discussion generated by > >> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html. I've added > >> vec_construct to the cost model for use in vect_model_load_cost, and > >> implemented a cost calculation that makes sense to me for PowerPC. I'm > >> less certain about the default, i386, and spu implementations. I took a > >> guess at i386 from the discussions we had, and used the same calculation > >> for the default and for spu. I'm hoping you or others can fill in the > >> blanks if I guessed badly. > >> > >> The i386 cost for vec_construct is different from all the others, which > >> are parameterized for each processor description. This should probably > >> be parameterized in some way as well, but thought you'd know better than > >> I how that should be. Perhaps instead of > >> > >> elements / 2 + 1 > >> > >> it should be > >> > >> (elements / 2) * X + Y > >> > >> where X and Y are taken from the processor description, and represent > >> the cost of a merge and a permute, respectively. Let me know what you > >> think. > > > > Just trying to understand how you arrived at the above formulas in investigating > > strangely low cost for v16qi construction of 9. If we pairwise reduce elements > > with a cost of 1 then we arrive at a cost of elements - 1, that's what you'd > > get with not accounting an initial move of element zero into a vector and then > > inserting each other element into that with elements - 1 inserts. > > What I wrote there only makes partial sense for certain types on Power, so far as > I can tell, and even then it doesn’t generalize properly. When the scalar registers > are contained in the vector registers (as happens for floating-point on Power), then > you can do some merges and other forms of permutes to combine them faster > than doing specific inserts. But that isn’t a general solution even on Power; for the > integer modes we still do inserts. You mean Power has instructions to combine more than two vector registers into one? Otherwise you still need n / 2 plus n / 4 plus n / 8 ... "permutes" which boils down to n - 1. > So what you have makes sense to me, and what’s currently in place for Power needs > work also, so far as I can tell. I’ll take a note to revisit this. Thanks. Richard. > — Bill > > > > > This also matches up with code-generation on x86_64 for > > > > vT foo (T a, T b, ...) > > { > > return (vT) {a, b, ... }; > > } > > > > for any vector / element type combination I tried. Thus the patch below. > > > > I'll bootstrap / test that on x86_64-linux and I'm leaving other > > targets to target > > maintainers. > > > > Ok for the i386 parts? > > > > Thanks, > > Richard. > > > > 2016-06-08 Richard Biener > > > > * targhooks.c (default_builtin_vectorization_cost): Adjust > > vec_construct cost. > > * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. > > > > Index: gcc/targhooks.c > > =================================================================== > > --- gcc/targhooks.c (revision 237196) > > +++ gcc/targhooks.c (working copy) > > @@ -589,8 +589,7 @@ default_builtin_vectorization_cost (enum > > return 3; > > > > case vec_construct: > > - elements = TYPE_VECTOR_SUBPARTS (vectype); > > - return elements / 2 + 1; > > + return TYPE_VECTOR_SUBPARTS (vectype) - 1; > > > > default: > > gcc_unreachable (); > > Index: gcc/config/i386/i386.c > > =================================================================== > > --- gcc/config/i386/i386.c (revision 237196) > > +++ gcc/config/i386/i386.c (working copy) > > @@ -49503,8 +49520,6 @@ static int > > ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > > tree vectype, int) > > { > > - unsigned elements; > > - > > switch (type_of_cost) > > { > > case scalar_stmt: > > @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve > > return ix86_cost->vec_stmt_cost; > > > > case vec_construct: > > - elements = TYPE_VECTOR_SUBPARTS (vectype); > > - return ix86_cost->vec_stmt_cost * (elements / 2 + 1); > > + return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - 1); > > > > default: > > gcc_unreachable (); > > > > > >> Thanks, > >> Bill > >> > >> > >> 2012-06-12 Bill Schmidt > >> > >> * targhooks.c (default_builtin_vectorized_conversion): Handle > >> vec_construct, using vectype to base cost on subparts. > >> * target.h (enum vect_cost_for_stmt): Add vec_construct. > >> * tree-vect-stmts.c (vect_model_load_cost): Use vec_construct > >> instead of scalar_to-vec. > >> * config/spu/spu.c (spu_builtin_vectorization_cost): Handle > >> vec_construct in same way as default for now. > >> * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. > >> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): > >> Handle vec_construct, including special case for 32-bit loads. > >> > >> > >> Index: gcc/targhooks.c > >> =================================================================== > >> --- gcc/targhooks.c (revision 188482) > >> +++ gcc/targhooks.c (working copy) > >> @@ -499,9 +499,11 @@ default_builtin_vectorized_conversion (unsigned in > >> > >> int > >> default_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > >> - tree vectype ATTRIBUTE_UNUSED, > >> + tree vectype, > >> int misalign ATTRIBUTE_UNUSED) > >> { > >> + unsigned elements; > >> + > >> switch (type_of_cost) > >> { > >> case scalar_stmt: > >> @@ -524,6 +526,11 @@ default_builtin_vectorization_cost (enum vect_cost > >> case cond_branch_taken: > >> return 3; > >> > >> + case vec_construct: > >> + elements = TYPE_VECTOR_SUBPARTS (vectype); > >> + gcc_assert (elements > 1); > >> + return elements / 2 + 1; > >> + > >> default: > >> gcc_unreachable (); > >> } > >> Index: gcc/target.h > >> =================================================================== > >> --- gcc/target.h (revision 188482) > >> +++ gcc/target.h (working copy) > >> @@ -146,7 +146,8 @@ enum vect_cost_for_stmt > >> cond_branch_not_taken, > >> cond_branch_taken, > >> vec_perm, > >> - vec_promote_demote > >> + vec_promote_demote, > >> + vec_construct > >> }; > >> > >> /* The target structure. This holds all the backend hooks. */ > >> Index: gcc/tree-vect-stmts.c > >> =================================================================== > >> --- gcc/tree-vect-stmts.c (revision 188482) > >> +++ gcc/tree-vect-stmts.c (working copy) > >> @@ -1031,11 +1031,13 @@ vect_model_load_cost (stmt_vec_info stmt_info, int > >> /* The loads themselves. */ > >> if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) > >> { > >> - /* N scalar loads plus gathering them into a vector. > >> - ??? scalar_to_vec isn't the cost for that. */ > >> + /* N scalar loads plus gathering them into a vector. */ > >> + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > >> inside_cost += (vect_get_stmt_cost (scalar_load) * ncopies > >> - * TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))); > >> - inside_cost += ncopies * vect_get_stmt_cost (scalar_to_vec); > >> + * TYPE_VECTOR_SUBPARTS (vectype)); > >> + inside_cost += ncopies > >> + * targetm.vectorize.builtin_vectorization_cost (vec_construct, > >> + vectype, 0); > >> } > >> else > >> vect_get_load_cost (first_dr, ncopies, > >> Index: gcc/config/spu/spu.c > >> =================================================================== > >> --- gcc/config/spu/spu.c (revision 188482) > >> +++ gcc/config/spu/spu.c (working copy) > >> @@ -6908,9 +6908,11 @@ spu_builtin_mask_for_load (void) > >> /* Implement targetm.vectorize.builtin_vectorization_cost. */ > >> static int > >> spu_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > >> - tree vectype ATTRIBUTE_UNUSED, > >> + tree vectype, > >> int misalign ATTRIBUTE_UNUSED) > >> { > >> + unsigned elements; > >> + > >> switch (type_of_cost) > >> { > >> case scalar_stmt: > >> @@ -6937,6 +6939,11 @@ spu_builtin_vectorization_cost (enum vect_cost_for > >> case cond_branch_taken: > >> return 6; > >> > >> + case vec_construct: > >> + elements = TYPE_VECTOR_SUBPARTS (vectype); > >> + gcc_assert (elements > 1); > >> + return elements / 2 + 1; > >> + > >> default: > >> gcc_unreachable (); > >> } > >> Index: gcc/config/i386/i386.c > >> =================================================================== > >> --- gcc/config/i386/i386.c (revision 188482) > >> +++ gcc/config/i386/i386.c (working copy) > >> @@ -36072,9 +36072,11 @@ static const struct attribute_spec ix86_attribute_ > >> /* Implement targetm.vectorize.builtin_vectorization_cost. */ > >> static int > >> ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > >> - tree vectype ATTRIBUTE_UNUSED, > >> + tree vectype, > >> int misalign ATTRIBUTE_UNUSED) > >> { > >> + unsigned elements; > >> + > >> switch (type_of_cost) > >> { > >> case scalar_stmt: > >> @@ -36115,6 +36117,11 @@ ix86_builtin_vectorization_cost (enum vect_cost_fo > >> case vec_promote_demote: > >> return ix86_cost->vec_stmt_cost; > >> > >> + case vec_construct: > >> + elements = TYPE_VECTOR_SUBPARTS (vectype); > >> + gcc_assert (elements > 1); > >> + return elements / 2 + 1; > >> + > >> default: > >> gcc_unreachable (); > >> } > >> Index: gcc/config/rs6000/rs6000.c > >> =================================================================== > >> --- gcc/config/rs6000/rs6000.c (revision 188482) > >> +++ gcc/config/rs6000/rs6000.c (working copy) > >> @@ -3405,6 +3405,7 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > >> tree vectype, int misalign) > >> { > >> unsigned elements; > >> + tree elem_type; > >> > >> switch (type_of_cost) > >> { > >> @@ -3504,6 +3505,19 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > >> > >> return 2; > >> > >> + case vec_construct: > >> + elements = TYPE_VECTOR_SUBPARTS (vectype); > >> + elem_type = TREE_TYPE (vectype); > >> + gcc_assert (elements > 1); > >> + /* 32-bit vectors loaded into registers are stored as double > >> + precision, so we need n/2 converts in addition to the usual > >> + n/2 merges to construct a vector of short floats from them. */ > >> + if (SCALAR_FLOAT_TYPE_P (elem_type) > >> + && TYPE_PRECISION (elem_type) == 32) > >> + return elements + 1; > >> + else > >> + return elements / 2 + 1; > >> + > >> default: > >> gcc_unreachable (); > >> } > >> > >> > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ---1609949885-218008479-1465394715=:1493--