From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24240 invoked by alias); 9 Sep 2013 18:02:58 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 24154 invoked by uid 89); 9 Sep 2013 18:02:58 -0000 Received: from mail2-relais-roc.national.inria.fr (HELO mail2-relais-roc.national.inria.fr) (192.134.164.83) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 09 Sep 2013 18:02:58 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,KHOP_DYNAMIC,KHOP_THREADED,RCVD_IN_PBL,RCVD_IN_RP_RNBL,RCVD_IN_SORBS_DUL autolearn=no version=3.3.2 X-HELO: mail2-relais-roc.national.inria.fr Received: from ip-27.net-81-220-32.lyon.rev.numericable.fr (HELO laptop-mg.local) ([81.220.32.27]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 09 Sep 2013 20:02:52 +0200 Date: Mon, 09 Sep 2013 18:02:00 -0000 From: Marc Glisse Reply-To: gcc@gcc.gnu.org To: Vidya Praveen cc: gcc@gcc.gnu.org, rguenther@suse.de, ook@ucw.cz Subject: Re: [RFC] Vectorization of indexed elements In-Reply-To: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> Message-ID: References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-SW-Source: 2013-09/txt/msg00062.txt.bz2 On Mon, 9 Sep 2013, Vidya Praveen wrote: > Hello, > > This post details some thoughts on an enhancement to the vectorizer that > could take advantage of the SIMD instructions that allows indexed element > as an operand thus reducing the need for duplication and possibly improve > reuse of previously loaded data. > > Appreciate your opinion on this. > > --- > > A phrase like this: > > for(i=0;i<4;i++) > a[i] = b[i] c[2]; > > is usually vectorized as: > > va:V4SI = a[0:3] > vb:V4SI = b[0:3] > t = c[2] > vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init > ... > va:V4SI = vb:V4SI vc:V4SI > > But this could be simplified further if a target has instructions that support > indexed element as a parameter. For example an instruction like this: > > mul v0.4s, v1.4s, v2.4s[2] > > can perform multiplication of each element of v2.4s with the third element of > v2.4s (specified as v2.4s[2]) and store the results in the corresponding > elements of v0.4s. > > For this to happen, vectorizer needs to understand this idiom and treat the > operand c[2] specially (and by taking in to consideration if the machine > supports indexed element as an operand for through a target hook or macro) > and consider this as vectorizable statement without having to duplicate the > elements explicitly. > > There are fews ways this could be represented at gimple: > > ... > va:V4SI = vb:V4SI VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) > ... > > or by allowing a vectorizer treat an indexed element as a valid operand in a > vectorizable statement: Might as well allow any scalar then... > ... > va:V4SI = vb:V4SI VEC_SELECT_EXPR (vc:V4SI 2) > ... > > For the sake of explanation, the above two representations assumes that > c[0:3] is loaded in vc for some other use and reused here. But when c[2] is the > only use of 'c' then it may be safer to just load one element and use it like > this: > > vc:V4SI[0] = c[2] > va:V4SI = vb:V4SI VEC_SELECT_EXPR (vc:V4SI 0) > > This could also mean that expressions involving scalar could be treated > similarly. For example, > > for(i=0;i<4;i++) > a[i] = b[i] c > > could be vectorized as: > > vc:V4SI[0] = c > va:V4SI = vb:V4SI VEC_SELECT_EXPR (vc:V4SI 0) > > Such a change would also require new standard pattern names to be defined for > each . > > Alternatively, having something like this: > > ... > vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) > va:V4SI = vb:V4SI vt:V4SI > ... > > would remove the need to introduce several new standard pattern names but have > just one to represent vec_duplicate(vec_select()) but ofcourse this will expect > the target to have combiner patterns. The cost estimation wouldn't be very good, but aren't combine patterns enough for the whole thing? Don't you model your mul instruction as: (mult:V4SI (match_operand:V4SI) (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI)))) anyway? Seems that combine should be able to handle it. What currently happens that we fail to generate the right instruction? In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR for vec_duplicate, adding new nodes is always painful. > This enhancement could possibly help further optimizing larger scenarios such > as linear systems. > > Regards > VP -- Marc Glisse