Re: [PATCH] improve BB vectorization dump locations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] improve BB vectorization dump locations
Date: Fri, 11 Sep 2020 11:37:15 +0100	[thread overview]
Message-ID: <mptv9gkha5g.fsf@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2009111114540.9963@zhemvz.fhfr.qr> (Richard Biener's message of "Fri, 11 Sep 2020 11:20:04 +0200 (CEST)")

Richard Biener <rguenther@suse.de> writes:
> This tries to improve BB vectorization dumps by providing more
> precise locations.  Currently the vect_location is simply the
> very last stmt in a basic-block that has a location.  So for
>
> double a[4], b[4];
> int x[4], y[4];
> void foo()
> {
>   a[0] = b[0]; // line 5
>   a[1] = b[1];
>   a[2] = b[2];
>   a[3] = b[3];
>   x[0] = y[0]; // line 9
>   x[1] = y[1];
>   x[2] = y[2];
>   x[3] = y[3];
> } // line 13
>
> we show the user with -O3 -fopt-info-vec
>
> t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
>
> while with the patch we point to both independently vectorized
> opportunities:
>
> t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
> t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
>
> there's the possibility that the location regresses in case the
> root stmt in the SLP instance has no location.  For a SLP subgraph
> with multiple entries the location also chooses one entry at random,
> not sure in which case we want to dump both.
>
> Still as the plan is to extend the basic-block vectorization
> scope from single basic-block to multiple ones this is a first
> step to preserve something sensible.
>
> Implementation-wise this makes both costing and code-generation
> happen on the subgraphs as analyzed.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Richard - is iteration over vector modes for BB vectorization
> still important now that we have related_vector_type and thus
> no longer only consider a fixed size?  If so it will probably
> make sense to somehow still iterate even if there was some
> SLP subgraph vectorized?  It also looks like BB vectorization
> was never updated to consider multiple modes based on cost,
> it will still pick the first opportunity.  For BB vectorization
> we also have the code that re-tries SLP discovery with
> splitting the store group.  So what's your overall thoughts to
> this?

I think there might be different answers for “in principle” and
“in practice”. :-)

In principle, there's no one right answer to (say) “what vector mode
should I use for 4 32-bit integers?”.  If the block is only operating on
that type, then VNx4SI is the right choice for 128-bit SVE.  But if the
block is mostly operating on 4 64-bit integers and just converting to
32-bit integers for a small region, then it might be better to use
2 VNx2SIs instead (paired with 2 VNx2DIs).

In practice, one situation in which the current loop might be needed
is pattern statements.  There we assign a vector type during pattern
recognition, based only on the element type.  So in that situation,
the first pass (with the autodetected base vector mode) will not take
the number of scalar stmts into account.

Also, although SLP currently only operates on full vectors,
I was hoping we would eventually support predication for SLP too.
At that point, the number of scalar statements wouldn't directly
determine the number of vector lanes.

On the cost thing: it would be better to try all four and pick the one
with the lowest cost, but given your in-progress changes, it seemed like
a dead end to do that with the current code.

It sounded like the general direction here was to build an SLP graph
and “solve” the vector type assignment problem in a more global way,
once we have a view of the entire graph.  Is that right?  If so,
then at that point we might be able to do something more intelligent
than just iterate over all the options.  (Although at the same time,
iterating over all the options on a fixed (sub?)graph would be cheaper
than what we do now.)

Thanks,
Richard

next prev parent reply	other threads:[~2020-09-11 10:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-11  9:20 Richard Biener
2020-09-11 10:37 ` Richard Sandiford [this message]
2020-09-11 10:58   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptv9gkha5g.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).