From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-180103-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24240 invoked by alias); 9 Sep 2013 18:02:58 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 24154 invoked by uid 89); 9 Sep 2013 18:02:58 -0000
Received: from mail2-relais-roc.national.inria.fr (HELO mail2-relais-roc.national.inria.fr) (192.134.164.83) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 09 Sep 2013 18:02:58 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,KHOP_DYNAMIC,KHOP_THREADED,RCVD_IN_PBL,RCVD_IN_RP_RNBL,RCVD_IN_SORBS_DUL autolearn=no version=3.3.2
X-HELO: mail2-relais-roc.national.inria.fr
Received: from ip-27.net-81-220-32.lyon.rev.numericable.fr (HELO laptop-mg.local) ([81.220.32.27])  by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 09 Sep 2013 20:02:52 +0200
Date: Mon, 09 Sep 2013 18:02:00 -0000
From: Marc Glisse <marc.glisse@inria.fr>
Reply-To: gcc@gcc.gnu.org
To: Vidya Praveen <vidyapraveen@arm.com>
cc: gcc@gcc.gnu.org, rguenther@suse.de, ook@ucw.cz
Subject: Re: [RFC] Vectorization of indexed elements
In-Reply-To: <20130909172533.GA25330@e103625-lin.cambridge.arm.com>
Message-ID: <alpine.DEB.2.10.1309091949090.3565@laptop-mg.saclay.inria.fr>
References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-SW-Source: 2013-09/txt/msg00062.txt.bz2

On Mon, 9 Sep 2013, Vidya Praveen wrote:

> Hello,
>
> This post details some thoughts on an enhancement to the vectorizer that
> could take advantage of the SIMD instructions that allows indexed element
> as an operand thus reducing the need for duplication and possibly improve
> reuse of previously loaded data.
>
> Appreciate your opinion on this.
>
> ---
>
> A phrase like this:
>
> for(i=0;i<4;i++)
>   a[i] = b[i] <op> c[2];
>
> is usually vectorized as:
>
>  va:V4SI = a[0:3]
>  vb:V4SI = b[0:3]
>  t = c[2]
>  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
>  ...
>  va:V4SI = vb:V4SI <op> vc:V4SI
>
> But this could be simplified further if a target has instructions that support
> indexed element as a parameter. For example an instruction like this:
>
>  mul v0.4s, v1.4s, v2.4s[2]
>
> can perform multiplication of each element of v2.4s with the third element of
> v2.4s (specified as v2.4s[2]) and store the results in the corresponding
> elements of v0.4s.
>
> For this to happen, vectorizer needs to understand this idiom and treat the
> operand c[2] specially (and by taking in to consideration if the machine
> supports indexed element as an operand for <op> through a target hook or macro)
> and consider this as vectorizable statement without having to duplicate the
> elements explicitly.
>
> There are fews ways this could be represented at gimple:
>
>  ...
>  va:V4SI = vb:V4SI <op> VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
>  ...
>
> or by allowing a vectorizer treat an indexed element as a valid operand in a
> vectorizable statement:

Might as well allow any scalar then...

>  ...
>  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 2)
>  ...
>
> For the sake of explanation, the above two representations assumes that
> c[0:3] is loaded in vc for some other use and reused here. But when c[2] is the
> only use of 'c' then it may be safer to just load one element and use it like
> this:
>
>  vc:V4SI[0] = c[2]
>  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
>
> This could also mean that expressions involving scalar could be treated
> similarly. For example,
>
>  for(i=0;i<4;i++)
>    a[i] = b[i] <op> c
>
> could be vectorized as:
>
>  vc:V4SI[0] = c
>  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
>
> Such a change would also require new standard pattern names to be defined for
> each <op>.
>
> Alternatively, having something like this:
>
>  ...
>  vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
>  va:V4SI = vb:V4SI <op> vt:V4SI
>  ...
>
> would remove the need to introduce several new standard pattern names but have
> just one to represent vec_duplicate(vec_select()) but ofcourse this will expect
> the target to have combiner patterns.

The cost estimation wouldn't be very good, but aren't combine patterns 
enough for the whole thing? Don't you model your mul instruction as:

(mult:V4SI
   (match_operand:V4SI)
   (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI))))

anyway? Seems that combine should be able to handle it. What currently 
happens that we fail to generate the right instruction?

In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR 
for vec_duplicate, adding new nodes is always painful.

> This enhancement could possibly help further optimizing larger scenarios such
> as linear systems.
>
> Regards
> VP

-- 
Marc Glisse