From: Richard Sandiford <richard.sandiford@arm.com>
To: Hao Liu OS via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Hao Liu OS <hliu@os.amperecomputing.com>
Subject: Re: [PATCH] Vect: select small VF for epilog of unrolled loop (PR tree-optimization/110474)
Date: Wed, 05 Jul 2023 20:37:47 +0100 [thread overview]
Message-ID: <mptbkgqqhdg.fsf@arm.com> (raw)
In-Reply-To: <SJ2PR01MB8635E49C6DC6B89D31D6390FE12FA@SJ2PR01MB8635.prod.exchangelabs.com> (Hao Liu's message of "Wed, 5 Jul 2023 08:46:26 +0000")
Hao Liu OS via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1),
> the VFs of both main and epilog loop are enlarged. The epilog vect loop is
> specific for a loop with small iteration counts, so a large VF may hurt
> performance.
>
> This patch unscales the main loop VF by suggested_unroll_factor while selecting
> the epilog loop VF, so that it will be the same as vectorized loop without
> unrolling (i.e. suggested_unroll_factor = 1).
I agree that unrolling the main loop shouldn't cause more iterations
to be handled by the scalar code. It would be nice to support multiple
epilogues, but that's probably a lot of work.
> gcc/ChangeLog:
>
> PR tree-optimization/110474
> * tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested
> unroll factor while selecting the epilog vect loop VF.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/pr110474.c: New testcase.
OK, thanks.
Richard
> ---
> gcc/testsuite/gcc.target/aarch64/pr110474.c | 37 +++++++++++++++++++++
> gcc/tree-vect-loop.cc | 16 +++++----
> 2 files changed, 47 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110474.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110474.c b/gcc/testsuite/gcc.target/aarch64/pr110474.c
> new file mode 100644
> index 00000000000..e548416162a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr110474.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mtune=neoverse-n2 -mcpu=neoverse-n1 -fdump-tree-vect-details --param aarch64-vect-unroll-limit=2" } */
> +/* { dg-final { scan-tree-dump "Choosing vector mode V8HI" "vect" } } */
> +/* { dg-final { scan-tree-dump "Choosing epilogue vector mode V8QI" "vect" } } */
> +
> +/* Do not increase the the vector factor of the epilog vectorized loop
> + for a loop with suggested_unroll_factor > 1.
> +
> + before (suggested_unroll_factor=1):
> + if N >= 16:
> + main vect loop
> + if N >= 8:
> + epilog vect loop
> + scalar code
> +
> + before (suggested_unroll_factor=2):
> + if N >= 32:
> + main vect loop
> + if N >= 16: // May fail to execute vectorized code (e.g. N is 8)
> + epilog vect loop
> + scalar code
> +
> + after (suggested_unroll_factor=2):
> + if N >= 32:
> + main vect loop
> + if N >= 8: // The same VF as suggested_unroll_factor=1
> + epilog vect loop
> + scalar code */
> +
> +int
> +foo (short *A, char *B, int N)
> +{
> + int sum = 0;
> + for (int i = 0; i < N; ++i)
> + sum += A[i] * B[i];
> + return sum;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3b46c58a8d8..4d9abd035ea 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3021,12 +3021,16 @@ start_over:
> to be able to handle fewer than VF scalars, or needs to have a lower VF
> than the main loop. */
> if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> - && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> - && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> - LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
> - return opt_result::failure_at (vect_location,
> - "Vectorization factor too high for"
> - " epilogue loop.\n");
> + && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> + {
> + poly_uint64 unscaled_vf
> + = exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo),
> + orig_loop_vinfo->suggested_unroll_factor);
> + if (maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), unscaled_vf))
> + return opt_result::failure_at (vect_location,
> + "Vectorization factor too high for"
> + " epilogue loop.\n");
> + }
>
> /* Decide whether this loop_vinfo should use partial vectors or peeling,
> assuming that the loop will be used as a main loop. We will redo
prev parent reply other threads:[~2023-07-05 19:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 8:46 Hao Liu OS
2023-07-05 19:37 ` Richard Sandiford [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptbkgqqhdg.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hliu@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).