From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-180291-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14253 invoked by alias); 30 Sep 2013 13:19:35 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 14238 invoked by uid 89); 30 Sep 2013 13:19:35 -0000
Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Sep 2013 13:19:35 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.3.2
X-HELO: mx2.suse.de
Received: from relay1.suse.de (unknown [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id 60C46A52C6;	Mon, 30 Sep 2013 15:19:32 +0200 (CEST)
Date: Mon, 30 Sep 2013 13:19:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Vidya Praveen <vidyapraveen@arm.com>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>, "ook@ucw.cz" <ook@ucw.cz>
Subject: Re: [RFC] Vectorization of indexed elements
In-Reply-To: <20130930125454.GD3460@e103625-lin.cambridge.arm.com>
Message-ID: <alpine.LNX.2.00.1309301504120.5759@zhemvz.fhfr.qr>
References: <20130909172533.GA25330@e103625-lin.cambridge.arm.com> <alpine.DEB.2.10.1309091949090.3565@laptop-mg.saclay.inria.fr> <20130924150425.GE22907@e103625-lin.cambridge.arm.com> <alpine.LNX.2.00.1309251123490.29411@zhemvz.fhfr.qr> <20130927145008.GA861@e103625-lin.cambridge.arm.com> <20130927151945.GB861@e103625-lin.cambridge.arm.com> <20130930125454.GD3460@e103625-lin.cambridge.arm.com>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SW-Source: 2013-09/txt/msg00250.txt.bz2

On Mon, 30 Sep 2013, Vidya Praveen wrote:

> On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
> > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
> > [...]
> > > > > I can't really insist on the single lane load.. something like:
> > > > > 
> > > > > vc:V4SI[0] = c
> > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
> > > > > va:V4SI = vb:V4SI <op> vt:V4SI
> > > > > 
> > > > > Or is there any other way to do this?
> > > > 
> > > > Can you elaborate on "I can't really insist on the single lane load"?
> > > > What's the single lane load in your example? 
> > > 
> > > Loading just one lane of the vector like this:
> > > 
> > > vc:V4SI[0] = c // from the above scalar example
> > > 
> > > or 
> > > 
> > > vc:V4SI[0] = c[2] 
> > > 
> > > is what I meant by single lane load. In this example:
> > > 
> > > t = c[2] 
> > > ...
> > > vb:v4si = b[0:3] 
> > > vc:v4si = { t, t, t, t }
> > > va:v4si = vb:v4si <op> vc:v4si 
> > > 
> > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot
> > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be 
> > > seen as vec_select:SI (vect_t 0) ). 
> > > 
> > > > I'd expect the instruction
> > > > pattern as quoted to just work (and I hope we expand an uniform
> > > > constructor { a, a, a, a } properly using vec_duplicate).
> > > 
> > > As much as I went through the code, this is only done using vect_init. It is
> > > not expanded as vec_duplicate from, for example, store_constructor() of expr.c
> > 
> > Do you see any issues if we expand such constructor as vec_duplicate directly 
> > instead of going through vect_init way? 
> 
> Sorry, that was a bad question.
> 
> But here's what I would like to propose as a first step. Please tell me if this
> is acceptable or if it makes sense:
> 
> - Introduce standard pattern names 
> 
> "vmulim4" - vector muliply with second operand as indexed operand
> 
> Example:
> 
> (define_insn "vmuliv4si4"
>    [set (match_operand:V4SI 0 "register_operand")
>         (mul:V4SI (match_operand:V4SI 1 "register_operand")
>                   (vec_duplicate:V4SI
>                     (vec_select:SI
>                       (match_operand:V4SI 2 "register_operand")
>                       (match_operand:V4SI 3 "immediate_operand)))))]
>  ...
> )

We could factor this with providing a standard pattern name for

(define_insn "vdupi<mode>"
  [set (match_operand:<mode> 0 "register_operand")
       (vec_duplicate:<mode>
          (vec_select:<scalarmode>
             (match_operand:<mode> 1 "register_operand")
             (match_operand:SI 2 "immediate_operand))))]

(you use V4SI for the immediate?  Ideally vdupi has another custom
mode for the vector index).

Note that this factored pattern is already available as vec_perm_const!
It is simply (vec_perm_const:V4SI <source> <source> <immediate-selector>).

Which means that on the GIMPLE level we should try to combine

el_4 = BIT_FIELD_REF <v_3, ...>;
v_5 = { el_4, el_4, ... };

into

v_5 = VEC_PERM_EXPR <v_3, v_3, ...>;

which it should already do with simplify_permutation.

But I'm not sure what you are after at then end ;)

Richard.