public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: Alan Hayward <alan.hayward@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
Date: Fri, 11 Sep 2015 13:23:00 -0000	[thread overview]
Message-ID: <1441977591.2795.11.camel@gnopaine> (raw)
In-Reply-To: <D2184E16.8003%alan.hayward@arm.com>

Hi Alan,

I probably wasn't clear enough.  The implementation in the vectorizer is
fine and I'm not asking that to change per target.  What I'm objecting
to is the equivalence between a REDUC_MAX_EXPR and a cost associated
with vec_to_scalar.  This assumes that the back end will implement a
REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
But those back ends should be free to model the cost of the
REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
ARM, this cost will be the same as a vec_to_scalar.  For others, it may
not be; for powerpc, it certainly will not be.

We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
expansion, and therefore it is not correct for us to explode this in
tree-vect-generic.  This would expand the code size without providing
any significant optimization opportunity, and could reduce the ability
to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
gimple vectorizers.

I apologize if my loose use of language confused the issue.  It isn't
the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
that are used by it.

(The costs in powerpc won't be enormous, but they are definitely
mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
instructions, where n is the number of elements in the mode being
vectorized.)

A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
that has to be broadcast back to a vector, and the best way to implement
it for us already has the max value in all positions of a vector.  But
that is something we should be able to fix with simplify-rtx in the back
end.

Thanks,
Bill


On Fri, 2015-09-11 at 10:15 +0100, Alan Hayward wrote:
> Hi Bill,
> 
> I’d be a bit worried about asking the backend for the cost of a
> COND_REDUCTION, as that will rely on the backend understanding the
> implementation the vectorizer is using - every time the vectorizer
> changed, the backends would need to be updated too. I’m hoping soon to get
> together a patch to reduce the stmts produced on the simpler cases, which
> would require a different set of costings. I can also imagine further
> improvements being added for other special cases over time. Having the
> backends understand every variation would be a little cumbersome.
> 
> As it stands today, we correctly exit the optimisation if max reduction
> isn’t supported in hardware, which is what the cost model is expecting.
> 
> 
> If power wanted to use this implementation, then I think it’d probably
> need some code in tree-vect-generic.c to implement on emulated max
> reduction, which would then require updates to the costs modelling of
> anything that uses max reduction (not just cond reduction). All of that is
> outside the scope of this patch.
> 
> 
> Thanks,
> Alan.
> 
> On 10/09/2015 23:14, "Bill Schmidt" <wschmidt@linux.vnet.ibm.com> wrote:
> 
> >Hi Alan,
> >
> >The cost modeling of the epilogue code seems pretty target-specific ("An
> >EQ stmt and an AND stmt, reduction of the max index and a reduction of
> >the found values, a broadcast of the max value," resulting in two
> >vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
> >this will not represent the cost accurately, and the cost will indeed be
> >quite different depending on the mode (logarithmic in the number of
> >elements).  I think that you need to create a new entry in
> >vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
> >each target to calculate the cost appropriately.
> >
> >(Powerpc doesn't have a max-reduction hardware instruction, but because
> >the reduction will be only in the epilogue code, it may still be
> >profitable for us to generate the somewhat expensive reduction sequence
> >in order to vectorize the loop.  But we definitely want to model it as
> >costly in and of itself.  Also, the sequence will produce the maximum
> >value in all positions without a separate broadcast.)
> >
> >Thanks,
> >Bill
> >
> >On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
> >> Hi,
> >> This patch (attached) adds support for vectorizing conditional
> >>expressions
> >> (PR 65947), for example:
> >> 
> >> int condition_reduction (int *a, int min_v)
> >> {
> >>   int last = 0;
> >>   for (int i = 0; i < N; i++)
> >>     if (a[i] < min_v)
> >>       last = a[i];
> >>   return last;
> >> }
> >> 
> >> To do this the loop is vectorised to create a vector of data results (ie
> >> of matching a[i] values). Using an induction variable, an additional
> >> vector is added containing the indexes where the matches occured. In the
> >> function epilogue this is reduced to a single max value and then used to
> >> index into the vector of data results.
> >> When no values are matched in the loop, the indexes vector will contain
> >> all zeroes, eventually matching the first entry in the data results
> >>vector.
> >> 
> >> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This
> >>is
> >> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> >> REDUC_MAX_EXPR is not supported for the required modes, failing the
> >> vectorization. On mips it complains that the required vcond expression
> >>is
> >> not supported. It is suggested the relevant backend experts add the
> >> required backend support.
> >> 
> >> Using a simple testcase based around a large number of N and run on an
> >> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> >> it's original time.
> >> 
> >> This patch caused binary differences in three spec2006 binaries on
> >>aarch64
> >> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> >> showed no improvement or degregation in runtime.
> >> 
> >> 
> >> In the near future I hope to submit a further patch (as PR 66558) which
> >> optimises the case where the result is simply the index of the loop, for
> >> example:
> >> int condition_reduction (int *a, int min_v)
> >> {
> >>   int last = 0;
> >>   for (int i = 0; i < N; i++)
> >>     if (a[i] < min_v)
> >>       last = i;
> >>   return last;
> >> }
> >> In this case a lot of the new code can be optimized away.
> >> 
> >> I have run check for aarch64, arm and x86 and have seen no regressions.
> >> 
> >> 
> >> Changelog:
> >> 
> >>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
> >> 
> >>         PR tree-optimization/65947
> >>         * tree-vect-loop.c
> >>         (vect_is_simple_reduction_1): Find condition reductions.
> >>         (vect_model_reduction_cost): Add condition reduction costs.
> >>         (get_initial_def_for_reduction): Add condition reduction initial
> >> var.
> >>         (vect_create_epilog_for_reduction): Add condition reduction
> >>epilog.
> >>         (vectorizable_reduction): Condition reduction support.
> >>         * tree-vect-stmts.c
> >>         (vectorizable_condition): Add vect reduction arg
> >>         * doc/sourcebuild.texi (Vector-specific attributes): Document
> >>         vect_max_reduc
> >> 
> >>     testsuite/Changelog:
> >> 
> >>         PR tree-optimization/65947
> >>         * lib/target-supports.exp
> >>         (check_effective_target_vect_max_reduc): Add.
> >>         * gcc.dg/vect/pr65947-1.c: New test.
> >>         * gcc.dg/vect/pr65947-2.c: New test.
> >>         * gcc.dg/vect/pr65947-3.c: New test.
> >>         * gcc.dg/vect/pr65947-4.c: New test.
> >>         * gcc.dg/vect/pr65947-5.c: New test.
> >>         * gcc.dg/vect/pr65947-6.c: New test.
> >>         * gcc.dg/vect/pr65947-7.c: New test.
> >>         * gcc.dg/vect/pr65947-8.c: New test.
> >>         * gcc.dg/vect/pr65947-9.c: New test.
> >>         * gcc.dg/vect/pr65947-10.c: New test.
> >>         * gcc.dg/vect/pr65947-11.c: New test.
> >> 
> >> 
> >> 
> >> Thanks,
> >> Alan
> >> 
> >> 
> >
> >
> 
> 


  reply	other threads:[~2015-09-11 13:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-10 15:11 Alan Hayward
2015-09-10 22:34 ` Bill Schmidt
2015-09-11  9:19   ` Alan Hayward
2015-09-11 13:23     ` Bill Schmidt [this message]
2015-09-11 13:55       ` Ramana Radhakrishnan
2015-09-11 14:41         ` Richard Sandiford
2015-09-11 15:14           ` Bill Schmidt
2015-09-11 15:30             ` Richard Sandiford
2015-09-11 15:50               ` Bill Schmidt
2015-09-11 16:54                 ` Ramana Radhakrishnan
2015-09-15 11:47                   ` Richard Biener
2015-09-14  9:50       ` Alan Lawrence
2015-09-14 14:20         ` Bill Schmidt
2015-09-15 12:10 ` Richard Biener
2015-09-15 15:41   ` Alan Hayward
2015-09-18 12:22     ` Richard Biener
2015-09-18 13:36       ` Alan Lawrence
2015-09-18 14:14         ` Alan Hayward
     [not found]         ` <D221D55E.8386%alan.hayward@arm.com>
2015-09-23 16:07           ` Alan Hayward
2015-09-30 12:49             ` Richard Biener
2015-10-01 15:22               ` Alan Hayward
2015-09-15 12:12 ` Richard Biener
2015-10-19  8:34 Alan Hayward
2015-10-21 10:46 ` Richard Biener
2015-10-22 14:18 ` Alan Lawrence
2015-10-22 14:23   ` Alan Hayward

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1441977591.2795.11.camel@gnopaine \
    --to=wschmidt@linux.vnet.ibm.com \
    --cc=alan.hayward@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).