public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Richard Biener <rguenther@suse.de>
Subject: Re: [PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1
Date: Wed, 14 Jun 2023 14:14:02 +0100	[thread overview]
Message-ID: <mptilbqgp9h.fsf@arm.com> (raw)
In-Reply-To: <20230614094639.B578B3858430@sourceware.org> (Richard Biener via Gcc-patches's message of "Wed, 14 Jun 2023 09:46:18 +0000 (UTC)")

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Currently vect_determine_partial_vectors_and_peeling will decide
> to apply fully masking to the main loop despite
> --param vect-partial-vector-usage=1 when the currently analyzed
> vector mode results in a vectorization factor that's bigger
> than the number of scalar iterations.  That's undesirable for
> targets where a vector mode can handle both partial vector and
> non-partial vector vectorization.  I understand that for AARCH64
> we have SVE and NEON but SVE can only do partial vector and
> NEON only non-partial vector vectorization, plus the target
> chooses to let cost comparison decide the vector mode to use.

SVE can do both (and does non-partial for things that can't yet be
predicated, like reversing loads).  But yeah, NEON can only do
non-partial.

> For x86 and the upcoming AVX512 partial vector support the
> story is different, the target chooses the first (and largest)
> vector mode that can successfully used for vectorization.  But
> that means with --param vect-partial-vector-usage=1 we will
> always choose AVX512 with partial vectors for the main loop
> even if, for example, V4SI would be a perfect fit with full
> vectors and no required epilog!

Sounds like a good candidate for VECT_COMPARE_COSTS.  Did you
try using that?

> The following tries to find the appropriate condition for
> this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P
> on the main loop when --param vect-partial-vector-usage=1 will
> hurt AARCH64?

Yeah, I'd expect so.

> Incidentially looking up the docs for
> vect-partial-vector-usage suggests that it's not supposed to
> control epilog vectorization but instead
> "1 allows partial vector loads and stores if vectorization removes the
> need for the code to iterate".  That's probably OK in the end
> but if there's a fixed size vector mode that allows the same thing
> without using masking that would be better.
>
> I wonder if we should special-case known niter (bounds) somehow
> when analyzing the vector modes and override the targets sorting?
>
> Maybe we want a new --param in addition to vect-epilogues-nomask
> and vect-partial-vector-usage to say we want masked epilogues?
>
> 	* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
> 	For non-VLA vectorization interpret param_vect_partial_vector_usage == 1
> 	as only applying to epilogues.
> ---
>  gcc/tree-vect-loop.cc | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 9be66b8fbc5..9323aa572d4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo,
>  	  && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>  	  && !vect_known_niters_smaller_than_vf (loop_vinfo))
>  	LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> -      else
> +      /* Avoid using a large fixed size vectorization mode with masking
> +	 for the main loop when we were asked to only use masking for
> +	 the epilog.
> +	 ???  Ideally we'd start analysis with a better sized mode,
> +	 the param_vect_partial_vector_usage == 2 case suffers from
> +	 this as well.  But there's a catch-22.  */
> +      else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +		 && param_vect_partial_vector_usage == 1
> +		 && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))

I don't think is_constant is a good thing to test here.  The way things
work for SVE is essentially the same for VL-agnostic and VL-specific.

Also, I think this hard-codes the assumption that the smallest mode
isn't maskable.  Wouldn't it spuriously fail vectorisation if there
was no smaller mode available?

Similarly, it looks like it could fail for AVX512 if processing 511
characters, whereas I'd have expected AVX512 to still be the right
choice there.

If VECT_COMPARE_COSTS seems too expensive, we could try to look
for cases where a vector mode later in the list gives a VF that
is exactly equal to the number of scalar iterations.  (Exactly
*divides* the number of scalar iterations would be less clear-cut IMO.)

But converting a vector mode into a VF isn't trivial with our
current vectoriser structures, so I'm not sure how much of a
saving it would be over VECT_COMPARE_COSTS.  And it would be much
more special-purpose than VECT_COMPARE_COSTS.

Thanks,
Richard


>  	LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
>      }

       reply	other threads:[~2023-06-14 13:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230614094639.B578B3858430@sourceware.org>
2023-06-14 13:14 ` Richard Sandiford [this message]
2023-06-14 13:44   ` Richard Biener
2023-06-14  9:46 Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptilbqgp9h.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).