* [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
@ 2021-05-24 6:17 Andre Vieira (lists)
2021-05-24 7:21 ` Kewen.Lin
2021-05-24 10:30 ` Richard Sandiford
0 siblings, 2 replies; 5+ messages in thread
From: Andre Vieira (lists) @ 2021-05-24 6:17 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford, Richard Biener
[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]
Hi,
When vectorizing with --param vect-partial-vector-usage=1 the vectorizer
uses an unpredicated (all-true predicate for SVE) main loop and a
predicated tail loop. The way this was implemented seems to mean it
re-uses the same vector-mode for both loops, which means the tail loop
isn't an actual loop but only executes one iteration.
This patch uses the knowledge of the conditions to enter an epilogue
loop to help come up with a potentially more restricive upper bound.
Regression tested on aarch64-linux-gnu and also ran the testsuite using
'--param vect-partial-vector-usage=1' detecting no ICEs and no execution
failures.
Would be good to have this tested for PPC too as I believe they are the
main users of the --param vect-partial-vector-usage=1 option. Can
someone help me test (and maybe even benchmark?) this on a PPC target?
Kind regards,
Andre
gcc/ChangeLog:
* tree-vect-loop.c (vect_transform_loop): Use main loop's
various' thresholds
to narrow the upper bound on epilogue iterations.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.
[-- Attachment #2: epilogue_bound.patch --]
[-- Type: text/plain, Size: 2397 bytes --]
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
new file mode 100644
index 0000000000000000000000000000000000000000..a03229eb55585f637ebd5288fb4c00f8f921d44c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
+
+void
+foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ c[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
/* In these calculations the "- 1" converts loop iteration counts
back to latch counts. */
if (loop->any_upper_bound)
- loop->nb_iterations_upper_bound
- = (final_iter_may_be_partial
- ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
- lowest_vf) - 1
- : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
- lowest_vf) - 1);
+ {
+ loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
+ loop->nb_iterations_upper_bound
+ = (final_iter_may_be_partial
+ ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
+ lowest_vf) - 1
+ : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
+ lowest_vf) - 1);
+ if (main_vinfo)
+ {
+ unsigned int bound;
+ poly_uint64 main_iters
+ = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
+ LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
+ main_iters
+ = upper_bound (main_iters,
+ LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
+ if (can_div_away_from_zero_p (main_iters,
+ LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+ &bound))
+ loop->nb_iterations_upper_bound
+ = wi::umin ((widest_int) (bound - 1),
+ loop->nb_iterations_upper_bound);
+ }
+ }
if (loop->any_likely_upper_bound)
loop->nb_iterations_likely_upper_bound
= (final_iter_may_be_partial
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
2021-05-24 6:17 [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue Andre Vieira (lists)
@ 2021-05-24 7:21 ` Kewen.Lin
2021-05-25 8:42 ` Kewen.Lin
2021-05-24 10:30 ` Richard Sandiford
1 sibling, 1 reply; 5+ messages in thread
From: Kewen.Lin @ 2021-05-24 7:21 UTC (permalink / raw)
To: Andre Vieira (lists); +Cc: Richard Sandiford, Richard Biener, gcc-patches
Hi Andre,
on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
>
> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.
>
> This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.
>
> Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.
>
> Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?
>
Thanks for doing this! I can test it on Power10 which enables this parameter
by default, also evaluate its impact on SPEC2017 Ofast/unroll.
Do you have any preference for the baseline commit? I'll use r12-0 if it's fine.
BR,
Kewen
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
2021-05-24 6:17 [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue Andre Vieira (lists)
2021-05-24 7:21 ` Kewen.Lin
@ 2021-05-24 10:30 ` Richard Sandiford
1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2021-05-24 10:30 UTC (permalink / raw)
To: Andre Vieira (lists); +Cc: gcc-patches, Richard Biener
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> Hi,
>
> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer
> uses an unpredicated (all-true predicate for SVE) main loop and a
> predicated tail loop. The way this was implemented seems to mean it
> re-uses the same vector-mode for both loops, which means the tail loop
> isn't an actual loop but only executes one iteration.
>
> This patch uses the knowledge of the conditions to enter an epilogue
> loop to help come up with a potentially more restricive upper bound.
>
> Regression tested on aarch64-linux-gnu and also ran the testsuite using
> '--param vect-partial-vector-usage=1' detecting no ICEs and no execution
> failures.
>
> Would be good to have this tested for PPC too as I believe they are the
> main users of the --param vect-partial-vector-usage=1 option. Can
> someone help me test (and maybe even benchmark?) this on a PPC target?
>
> Kind regards,
> Andre
LGTM. OK if no objections and if the Power testing comes back clean.
Thanks,
Richard
> gcc/ChangeLog:
>
> * tree-vect-loop.c (vect_transform_loop): Use main loop's
> various' thresholds
> to narrow the upper bound on epilogue iterations.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a03229eb55585f637ebd5288fb4c00f8f921d44c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
> +
> +void
> +foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + c[i] = a[i] + b[i];
> +}
> +
> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
> /* In these calculations the "- 1" converts loop iteration counts
> back to latch counts. */
> if (loop->any_upper_bound)
> - loop->nb_iterations_upper_bound
> - = (final_iter_may_be_partial
> - ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
> - lowest_vf) - 1
> - : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
> - lowest_vf) - 1);
> + {
> + loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
> + loop->nb_iterations_upper_bound
> + = (final_iter_may_be_partial
> + ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
> + lowest_vf) - 1
> + : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
> + lowest_vf) - 1);
> + if (main_vinfo)
> + {
> + unsigned int bound;
> + poly_uint64 main_iters
> + = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
> + LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
> + main_iters
> + = upper_bound (main_iters,
> + LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
> + if (can_div_away_from_zero_p (main_iters,
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> + &bound))
> + loop->nb_iterations_upper_bound
> + = wi::umin ((widest_int) (bound - 1),
> + loop->nb_iterations_upper_bound);
> + }
> + }
> if (loop->any_likely_upper_bound)
> loop->nb_iterations_likely_upper_bound
> = (final_iter_may_be_partial
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
2021-05-24 7:21 ` Kewen.Lin
@ 2021-05-25 8:42 ` Kewen.Lin
2021-06-03 12:42 ` Andre Vieira (lists)
0 siblings, 1 reply; 5+ messages in thread
From: Kewen.Lin @ 2021-05-25 8:42 UTC (permalink / raw)
To: Andre Vieira (lists); +Cc: Richard Sandiford, Richard Biener, gcc-patches
on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:
> Hi Andre,
>
> on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
>> Hi,
>>
>> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.
>>
>> This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.
>>
>> Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.
>>
>> Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?
>>
>
>
> Thanks for doing this! I can test it on Power10 which enables this parameter
> by default, also evaluate its impact on SPEC2017 Ofast/unroll.
>
Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
SPEC2017 run didn't show any remarkable improvement/degradation.
BR,
Kewen
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
2021-05-25 8:42 ` Kewen.Lin
@ 2021-06-03 12:42 ` Andre Vieira (lists)
0 siblings, 0 replies; 5+ messages in thread
From: Andre Vieira (lists) @ 2021-06-03 12:42 UTC (permalink / raw)
To: Kewen.Lin; +Cc: Richard Sandiford, Richard Biener, gcc-patches
Thank you Kewen!!
I will apply this now.
BR,
Andre
On 25/05/2021 09:42, Kewen.Lin wrote:
> on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:
>> Hi Andre,
>>
>> on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
>>> Hi,
>>>
>>> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.
>>>
>>> This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.
>>>
>>> Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.
>>>
>>> Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?
>>>
>>
>> Thanks for doing this! I can test it on Power10 which enables this parameter
>> by default, also evaluate its impact on SPEC2017 Ofast/unroll.
>>
> Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
> SPEC2017 run didn't show any remarkable improvement/degradation.
>
> BR,
> Kewen
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-06-03 12:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24 6:17 [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue Andre Vieira (lists)
2021-05-24 7:21 ` Kewen.Lin
2021-05-25 8:42 ` Kewen.Lin
2021-06-03 12:42 ` Andre Vieira (lists)
2021-05-24 10:30 ` Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).