From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26662 invoked by alias); 4 Nov 2011 11:22:12 -0000 Received: (qmail 26654 invoked by uid 22791); 4 Nov 2011 11:22:10 -0000 X-SWARE-Spam-Status: No, hits=-3.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,TW_TM X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 04 Nov 2011 11:21:51 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.221.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 8C8BB8F903; Fri, 4 Nov 2011 12:21:49 +0100 (CET) Date: Fri, 04 Nov 2011 11:44:00 -0000 From: Richard Guenther To: Jakub Jelinek Cc: Ira Rosen , Richard Henderson , gcc-patches@gcc.gnu.org Subject: Re: Patch ping In-Reply-To: <20111104102525.GH1052@tyan-ft48-01.lab.bos.redhat.com> Message-ID: References: <20111102201829.GT1052@tyan-ft48-01.lab.bos.redhat.com> <20111104102525.GH1052@tyan-ft48-01.lab.bos.redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-11/txt/msg00536.txt.bz2 On Fri, 4 Nov 2011, Jakub Jelinek wrote: > On Fri, Nov 04, 2011 at 10:52:44AM +0100, Richard Guenther wrote: > > > - Gather vectorization patch + incremental patches > > > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02411.html > > > > I'm not sure I like using new builtins for gather representation > > on the tree level too much, given that we are now moving > > towards using tree codes for suffle. Thus, how complicated > > would it be to have a gather tree code and optab and to > > handle the mixed size index issue in the expander? > > > > I realize this would be quite some reorg to the patchset ... > > so, why did you choose builtins over a more generic approach? > > Because while permutations etc. are common to most targets, > currently gather is very specialized, specific to one target, > with lots of details how it must look like (e.g. the mask stuff where > we currently don't even have tree code for conditional loads or conditional > stores), and additionally unlike VEC_PERM_EXPR etc. which are normal > expressions this one is a reference (and conditional one too), so > I'm afraid I'd need to touch huge amounts of code (most places that > currently handle MEM_REF/TARGET_MEM_REF would need to handle > VEC_GATHER_MEM_REF too, as it is a memory read (well, set of conditional > memory reads). The i?86 backend already has (except 4) all the needed > builtins anyway and handles expanding them too, the 4 ones are just > to cope with the weirdo definition of some of them (half sized vectors). > And when it is represented as builtin, the side-effects are handled by > all optimization passes automatically, similarly how e.g. atomic builtins > are right now builtins and not expressions. > > So I thought it is better to use builtins right now, then when we in 4.8+ > hopefully do something about conditional loads/stores and their > vectorization and if we introduce for that some new GIMPLE representation, > this could be done on top of that. Ok. I guess it's ok to use builtins for now - I didn't think of the memory reference issue ;) > > > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02846.html > > > > + if (TREE_CODE (base) == MEM_REF) > > { > > - off = TREE_OPERAND (base, 1); > > + if (!integer_zerop (TREE_OPERAND (base, 1))) > > + { > > + if (off == NULL_TREE) > > + { > > + double_int moff = mem_ref_offset (base); > > + off = double_int_to_tree (sizetype, moff); > > + } > > + else > > + off = size_binop (PLUS_EXPR, off, TREE_OPERAND (base, 1)); > > > > that's not safe, TREE_OPEAND (base, 1) is of pointer type, so > > you unconditionally need to do the conversion to sizetype of > > TREE_OPEAND (base, 1). > > Ok, will fix. > > > The routine lacks comments - it's got quite big and fails to > > And add the comments. > > > state any reason for its complexity. I'm also not sure why > > DR would include any loop invariant parts of the SCEV - doesn't > > it instantiate just for the loop in question? > > I'm not sure I understand your question. With the incremental > patch I'm not using any DR info appart from DR_REF to determine > what is loop invariant part of the address and what is not. > The reason for that is that split_constant_offset turns the GIMPLE > code into a sometimes big tree, which actually may contain a mixture > of loop invariants/constants and SSA_NAMEs defined in the loop, > that all with casts, multiplications/additions/subtractions. > For gather I need to split it into a single loop invariant > argument (which can be computed before the loop as loop invariant > and thus can be arbitrary tree expression that is just gimplified > there) and another SSA_NAME defined into the loop which can be > vectorized which is perhaps sign-extended and multiplied by 1/2/4/8. > > With the approach the incremental patch does I just > walk what split_constant_offset during DR walks and peel off > loop invariants until I have something that should be used as the > vectorized index. It looks like split_constant_offset walks def stmts in an unbound fashion. That's surely a bad idea - SCEV should already have expanded everything non-loop-invariant, thus it should at most look through DEFs that trivially add to the constant offset, not through others. Richard.