From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 65575 invoked by alias); 25 Aug 2018 18:09:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 65071 invoked by uid 89); 25 Aug 2018 18:09:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=aforementioned, Space X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 25 Aug 2018 18:09:06 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id w7PI92HO025465; Sat, 25 Aug 2018 13:09:02 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id w7PI91sg025464; Sat, 25 Aug 2018 13:09:01 -0500 Date: Sat, 25 Aug 2018 18:09:00 -0000 From: Segher Boessenkool To: Bill Schmidt Cc: will_schmidt@vnet.ibm.com, Richard Biener , Bill Schmidt , David Edelsohn , GCC Patches Subject: Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat Message-ID: <20180825180900.GY24439@gate.crashing.org> References: <1533669927.4452.14.camel@brimstone.rchland.ibm.com> <6E9306D6-B8A1-42F6-90E9-0E6CC7AC66FA@linux.ibm.com> <1534434645.18801.8.camel@brimstone.rchland.ibm.com> <20180816205117.GA24439@gate.crashing.org> <1534801470.26070.20.camel@brimstone.rchland.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg01593.txt.bz2 Hey, On Sat, Aug 25, 2018 at 12:15:16PM -0500, Bill Schmidt wrote: > On 8/20/18 4:44 PM, Will Schmidt wrote: > > Enable GIMPLE folding of the vec_splat() intrinsic. (v3). > > > > This uses the tree_vec_extract() function out of tree-vect-generic.c > > to retrieve the splat value, which is a BIT_FIELD_REF. That function is > > made non-static as part of this change. > > > > Testcases are already in-tree. > > > > V2 updates, per feedback previously received. > > Forced arg1 into range (modulo #elements) before attempting to extract > > the splat value. > > Removed the (now unnecessary) code that did bounds-checking before calling > > the tree_vec_extract helper. > > Used arg0_type rather than lhs_type for calculating the tree size. > > > > V3 updates, inspired by additional offline discussion with Segher. > > Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element > > number less than the number of elements supported by the respective ARG1 > > type, so we do not attempt to gimple-fold the intrinsic if we determine our > > value is out of range. Thus, the initial check ensures that ARG1 is both > > constant and has a valid index into ARG0. > > The subsequent modulo operation is no longer necessary, and has been removed. > > 2018-08-20 Will Schmidt > > > > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for > > early gimple folding of vec_splat(). s/ // > > * tree-vect-generic.c: Remove static from tree_vec_extract() definition. > > * gimple-fold.h: Add an extern define for tree_vec_extract(). s/ / / > > + /* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid > > + index into the arg0 vector. */ Needs two more spaces indent? > > + unsigned int n_elts = VECTOR_CST_NELTS (arg0); > > + if (TREE_CODE (arg1) != INTEGER_CST > > + || TREE_INT_CST_LOW (arg1) > (n_elts -1)) > > Space between - and 1. > > I'm still not convinced we should do this. The semantics of vec_splat are > to select the element as arg1 modulo n_elts. Odd as it seems, it's historically > valid for values between 0 and 31 to be used regardless of the value of n_elts. But, the assembler complains. So it never worked, and it seems better to not allow it in the compiler either. A lot of existing testcases generate assembler code the assembler chokes on -- but the testcases are dg-do compile, so the assembler isn't run. This is also PR86987; I have patches, but it snowballs because of the aforementioned existing wrong testcases. > Since arg1 is a constant we can easily bring it into valid range and expand it > early, rather than losing optimization opportunities by keeping this as a > built-in call. We should just error on invalid input. > Do we have test cases for arg1 in {n_elts, ..., 31}? What's GCC's current > behavior? Maybe we already aren't honoring this technically legal case? We do, lots. Sometimes we compile it; the assembler chokes on the output (but the assembler is not run in those testcases). Sometimes we ICE. > Otherwise looks fine to me, though of course I can't approve. Looks fine to me, too. > > + int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type)) > > + * BITS_PER_UNIT / n_elts; (Well one thing then... Wrong indent?) Segher