public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: gcc@gcc.gnu.org
Cc: Vidya Praveen <vidyapraveen@arm.com>, ook@ucw.cz
Subject: Re: [RFC] Vectorization of indexed elements
Date: Tue, 10 Sep 2013 08:25:00 -0000	[thread overview]
Message-ID: <alpine.LNX.2.00.1309101019060.3869@zhemvz.fhfr.qr> (raw)
In-Reply-To: <alpine.DEB.2.10.1309091949090.3565@laptop-mg.saclay.inria.fr>

On Mon, 9 Sep 2013, Marc Glisse wrote:

> On Mon, 9 Sep 2013, Vidya Praveen wrote:
> 
> > Hello,
> > 
> > This post details some thoughts on an enhancement to the vectorizer that
> > could take advantage of the SIMD instructions that allows indexed element
> > as an operand thus reducing the need for duplication and possibly improve
> > reuse of previously loaded data.
> > 
> > Appreciate your opinion on this.
> > 
> > ---
> > 
> > A phrase like this:
> > 
> > for(i=0;i<4;i++)
> >   a[i] = b[i] <op> c[2];
> > 
> > is usually vectorized as:
> > 
> >  va:V4SI = a[0:3]
> >  vb:V4SI = b[0:3]
> >  t = c[2]
> >  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
> >  ...
> >  va:V4SI = vb:V4SI <op> vc:V4SI
> > 
> > But this could be simplified further if a target has instructions that
> > support
> > indexed element as a parameter. For example an instruction like this:
> > 
> >  mul v0.4s, v1.4s, v2.4s[2]
> > 
> > can perform multiplication of each element of v2.4s with the third element
> > of
> > v2.4s (specified as v2.4s[2]) and store the results in the corresponding
> > elements of v0.4s.
> > 
> > For this to happen, vectorizer needs to understand this idiom and treat the
> > operand c[2] specially (and by taking in to consideration if the machine
> > supports indexed element as an operand for <op> through a target hook or
> > macro)
> > and consider this as vectorizable statement without having to duplicate the
> > elements explicitly.
> > 
> > There are fews ways this could be represented at gimple:
> > 
> >  ...
> >  va:V4SI = vb:V4SI <op> VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  ...
> > 
> > or by allowing a vectorizer treat an indexed element as a valid operand in a
> > vectorizable statement:
> 
> Might as well allow any scalar then...

I agree.  The VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) form
would necessarily be two extra separate statements and thus subject
to CSE obfuscating it enough for RTL expansion to no longer notice it.

That said, allowing mixed scalar/vector ops isn't very nice and
your scheme can be simplified by just using

  vc:V4SI = VEC_DUPLICATE_EXPR <...>
  va:V4SI = vb:V4SI <op> vc:V4SI

where the expander only has to see that vc:V4SI is defined by
a duplicate.

> >  ...
> >  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 2)
> >  ...
> > 
> > For the sake of explanation, the above two representations assumes that
> > c[0:3] is loaded in vc for some other use and reused here. But when c[2] is
> > the
> > only use of 'c' then it may be safer to just load one element and use it
> > like
> > this:
> > 
> >  vc:V4SI[0] = c[2]
> >  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
> > 
> > This could also mean that expressions involving scalar could be treated
> > similarly. For example,
> > 
> >  for(i=0;i<4;i++)
> >    a[i] = b[i] <op> c
> > 
> > could be vectorized as:
> > 
> >  vc:V4SI[0] = c
> >  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
> > 
> > Such a change would also require new standard pattern names to be defined
> > for
> > each <op>.
> > 
> > Alternatively, having something like this:
> > 
> >  ...
> >  vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  va:V4SI = vb:V4SI <op> vt:V4SI
> >  ...
> > 
> > would remove the need to introduce several new standard pattern names but
> > have
> > just one to represent vec_duplicate(vec_select()) but ofcourse this will
> > expect
> > the target to have combiner patterns.
> 
> The cost estimation wouldn't be very good, but aren't combine patterns enough
> for the whole thing? Don't you model your mul instruction as:
> 
> (mult:V4SI
>   (match_operand:V4SI)
>   (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI))))
> 
> anyway? Seems that combine should be able to handle it. What currently happens
> that we fail to generate the right instruction?
> 
> In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR for
> vec_duplicate, adding new nodes is always painful.

True, though CONSTRUCTOR isn't a good vec_duplicate primitive.  But yes,
we have it that way at the moment and indeed adding new nodes is always
painful.

> > This enhancement could possibly help further optimizing larger scenarios
> > such
> > as linear systems.

Given that the vectorizer already handles all cases you quote but
just the expansion doesn't use the targets special abilities - can't
you just teach the expander to lookup the definition of the
vectors and see if it is an uniform CONSTRUCTOR?

Richard.

  reply	other threads:[~2013-09-10  8:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-09 17:25 Vidya Praveen
2013-09-09 18:02 ` Marc Glisse
2013-09-10  8:25   ` Richard Biener [this message]
2013-09-24 15:03     ` Vidya Praveen
2013-09-25  9:22       ` Richard Biener
2013-09-30 13:01         ` Vidya Praveen
2013-09-24 15:04   ` Vidya Praveen
2013-09-25  9:25     ` Richard Biener
2013-09-27 14:50       ` Vidya Praveen
2013-09-27 15:19         ` Vidya Praveen
2013-09-30 12:55           ` Vidya Praveen
2013-09-30 13:19             ` Richard Biener
2013-09-30 14:00               ` Vidya Praveen
2013-10-01  8:26                 ` Richard Biener
2013-10-11 14:54                   ` Vidya Praveen
2013-10-11 15:05                     ` Jakub Jelinek
2013-12-04 17:07                       ` Vidya Praveen
2013-10-14  8:05                     ` Richard Biener
2013-12-04 16:10                       ` Vidya Praveen
2013-12-06 11:48                         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LNX.2.00.1309101019060.3869@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc@gcc.gnu.org \
    --cc=ook@ucw.cz \
    --cc=vidyapraveen@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).